W4A : Accessibility of the web

This is the last of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

Several presentations looked at how accessible the web is.

Web Accessibility Snapshot

In 2006, an audit was performed by Nomensa for the United Nations. They reviewed 100 popular websites for conformance to accessibility guidelines.

The results weren’t positive: 97% of sites didn’t meet WCAG level 1.

Obviously, conformance to guidelines doesn’t mean a site is accessible, but it’s an important factor. It’s not sufficient, but it is required. Conformance to guidelines can’t prove that a website is accessible, however there are some guidelines that we can be certain would break accessibility if not followed. So they are at least a useful starting point.

However, 2006 is a long time ago now, and the Internet has changed a lot since. One project, from colleagues of mine at IBM, is creating a more up to date picture of the state of the web. They analysed a thousand of the most popular websites (according to Alexa) as well as a random sampling of a thousand other sites.

(Interestingly, they found no statistically significant difference between conformance in the most popular websites and the randomly selected ones).

Their intention is to perform this regularly, creating a Web Accessibility Snapshot, with regular updates on the status of accessibility of the web. It looks like it could become a valuable source of information.

Assessing accessibility

There was a lot of discussion about how to assess accessibility.

One paper argued there is an over-reliance on automated tools and a lack of awareness of the negative effects of this. They demonstrated a manual review of websites, comparing results with output from six popular tools. Their results showed how few accessibility problems automated tools discover.

Accurately assessing a website against accessibility guidelines doesn’t necessarily mean that you can prove a site is accessible or easy to use.

Some research presented suggests guidelines only cover a little over half of problems encountered by users. Usability studies suggest some websites that don’t meet guidelines may be easier to use than websites that do, as users may have effective coping strategies for (technically) non-compliant sites. This suggests we need a better way of assessing accessibility.

A better approach might be to observe users interact with a website and assess based on their experiences. One tool presented, WebTactics, showed an automated approach to assessing accessibility by observing a user and identifying behaviours they employ.

Another paper detailed how to add accessibility monitoring to a live website by adding additional JavaScript that captures and evaluates mouse clicks and button presses client-side before submitting them to a server for processing. Instead of requiring the user to perform predefined, and perhaps artificial, tasks, they hope to be able to discover tasks implicitly – that common tasks will emerge from the low-level actions that they collect.

Accessibility training

Given that most websites have some sort of accessibility problems, there was some talk about how this could be improved.

One project presented showed training that has been developed to raise awareness of how people with disabilities access the web, and the implications of the accessibility guidelines. It’s a practical course including hands-on assignments, and looks like it could be the sort of thing that could help web developers make a real difference.

Social Accessibility

Another project is using crowd-sourcing to improve web sites that already exist. Social Accessibility, another IBM project, enables volunteers to make web pages more accessible to the visually impaired.

It provides a mechanism for accessibility problems to be gathered directly from visually impaired users. Volunteers are then notified, and can respond using a tool that allows them to externally modify web pages to make them more accessible. It lets them publish metadata associated with the original web page. This can be applied to the web page for all visually impaired users who visit it in future using this tool, so that many users can benefit from the improvement.

cloud4all

Finally, a project called cloud4all is developing a roaming profile that stores your preferences in a way that multiple services can access. The focus is on accessibility – a user can store their accessibility needs in one place, and then interfaces can use this to adapt for them.


W4A : Accessibility of the web

This is the last of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

Several presentations looked at how accessible the web is.

Web Accessibility Snapshot

In 2006, an audit was performed by Nomensa for the United Nations. They reviewed 100 popular websites for conformance to accessibility guidelines.

The results weren’t positive: 97% of sites didn’t meet WCAG level 1.

Obviously, conformance to guidelines doesn’t mean a site is accessible, but it’s an important factor. It’s not sufficient, but it is required. Conformance to guidelines can’t prove that a website is accessible, however there are some guidelines that we can be certain would break accessibility if not followed. So they are at least a useful starting point.

However, 2006 is a long time ago now, and the Internet has changed a lot since. One project, from colleagues of mine at IBM, is creating a more up to date picture of the state of the web. They analysed a thousand of the most popular websites (according to Alexa) as well as a random sampling of a thousand other sites.

(Interestingly, they found no statistically significant difference between conformance in the most popular websites and the randomly selected ones).

Their intention is to perform this regularly, creating a Web Accessibility Snapshot, with regular updates on the status of accessibility of the web. It looks like it could become a valuable source of information.

Assessing accessibility

There was a lot of discussion about how to assess accessibility.

One paper argued there is an over-reliance on automated tools and a lack of awareness of the negative effects of this. They demonstrated a manual review of websites, comparing results with output from six popular tools. Their results showed how few accessibility problems automated tools discover.

Accurately assessing a website against accessibility guidelines doesn’t necessarily mean that you can prove a site is accessible or easy to use.

Some research presented suggests guidelines only cover a little over half of problems encountered by users. Usability studies suggest some websites that don’t meet guidelines may be easier to use than websites that do, as users may have effective coping strategies for (technically) non-compliant sites. This suggests we need a better way of assessing accessibility.

A better approach might be to observe users interact with a website and assess based on their experiences. One tool presented, WebTactics, showed an automated approach to assessing accessibility by observing a user and identifying behaviours they employ.

Another paper detailed how to add accessibility monitoring to a live website by adding additional JavaScript that captures and evaluates mouse clicks and button presses client-side before submitting them to a server for processing. Instead of requiring the user to perform predefined, and perhaps artificial, tasks, they hope to be able to discover tasks implicitly – that common tasks will emerge from the low-level actions that they collect.

Accessibility training

Given that most websites have some sort of accessibility problems, there was some talk about how this could be improved.

One project presented showed training that has been developed to raise awareness of how people with disabilities access the web, and the implications of the accessibility guidelines. It’s a practical course including hands-on assignments, and looks like it could be the sort of thing that could help web developers make a real difference.

Social Accessibility

Another project is using crowd-sourcing to improve web sites that already exist. Social Accessibility, another IBM project, enables volunteers to make web pages more accessible to the visually impaired.

It provides a mechanism for accessibility problems to be gathered directly from visually impaired users. Volunteers are then notified, and can respond using a tool that allows them to externally modify web pages to make them more accessible. It lets them publish metadata associated with the original web page. This can be applied to the web page for all visually impaired users who visit it in future using this tool, so that many users can benefit from the improvement.

cloud4all

Finally, a project called cloud4all is developing a roaming profile that stores your preferences in a way that multiple services can access. The focus is on accessibility – a user can store their accessibility needs in one place, and then interfaces can use this to adapt for them.


Dyslexia at W4A

This is the third of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

There were a few sessions presenting work done to improve understanding of how to better support people with dyslexia.

One interesting study investigated the effect of font size and line spacing on the readibility of wikipedia articles.

This was assessed in a variety of ways, some of which were based on the reader’s opinions, while others were based on measurements made of the reader during reading and of their understanding of the content after. The underlying question (can we make Wikipedia easier to read for dyslexics?) was compelling. It was also interesting to see this performed not on abstract passages of text, but in the context of using an actual website.

Accessibility isn’t just about the presentation but also the content itself. Another study looked at strategies for simplifying text that could make web pages more readable for dyslexic readers.

It compared the effectiveness of two strategies: firstly, providing synonyms on demand – giving a reader a way to be able to request an alternative for any word. The second was providing synonyms automatically – with complex words automatically substituted for simpler equivalents. Again, this was assessed in several ways, such as the speed of reading, the reader’s comprehension, on the reader’s opinion of easiness, on the effort it took (e.g. interpreting facial expression, etc.), on fixation duration measured using eye tracking, and so on.

On a more practical note, there were also tools presented that are being created to help support people with dyslexia.

Firefixia is a Firefox toolbar extension being created by colleagues of mine in IBM. It provides options for users to customise the web page they are looking at, offering modifications that have been demonstrated to make it easier for dyslexic users.

Dyseggxia is an impressive looking iPad game that aims to support children with dyslexia through fun word games.


W4A : Future of screen readers

This is the second of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

Several of the projects that I saw showed glimpses of a possible future for screen readers.

I’ve written about screen readers before, and some of the challenges with using them.

Interactive SIGHT

One project interpreted pictures of charts or graphs and created a textual summary of the information shown in them.

I’m still amazed at this. It takes a picture of a graph, not the original raw data, and generates sensible summaries of what it shows.

For example, given this image:

It can generate:

This graphic is about United States. The graphic shows that United States at 35 thousand dollars is the third highest with respect to the dollar value of gross domestic product per capita 2001 among the countries listed. Luxembourg at 44.2 thousand dollars is the highest

or

The dollar value of gross domestic product per capita 2001 is 25 thousand dollars for Britain, which has the lowest dollar value of product per capita 2001. United States has 1.4 times more product per capita 2001 than Britain. The difference between the dollar value of gross domestic product per capita 2001 for United States and that for Britain is 10 thousand dollars.

The original version was able to process bar graphs, and was presented to W4A in 2010. What I saw was an extension that added support for line graphs.

Their focus is on the sort of graphics found in newspapers and magazines – informational, rather than scientific graphs. They want to be able to generate a high level summary, rather than a list of plot points that require the user to build a mental model in order to interpret.

For example:

The image shows a line graph. The line graph presents the number of Walmmart’s sales of leather jackets. The line graph shows a trend that changes. The changing trend consists of a rising trend from 1997 to 1999 followed by a falling trend through 2006. The first segment is the rising trend. The rising trend is steep. The rising trend has a starting value of 1890. The rising trend has an ending value of 36840. The second segment is the falling trend. The falling trend has a starting value of 36840. The falling trend has an ending value of 12606.

The image shows a line graph. The line graph presents the number of people who started smoking under the age of 18 in the US. The line graph shows a trend that changes. The changing trend consists of a rising trend from 1962 to 1966 followed by a falling trend through 1980. The first segment is the rising trend. The rising trend is steep. The second segment is the falling trend.

It’s able to interpret an image and recognise trends, recognise how noisy or smooth it is, recognise if the trend changes, and more. Impressive.

Interpreting data in tables

Another project demonstrated restructuring data tables in web pages to make them easier to explore with a screenreader.

They have an interesting approach of analysing an HTML table and reorganising it to make it more accessible, abstracting out complex sections into a series of menus.

For example, given a table such as this:

it can produce navigable menus such as this:

Even quite complex tables, with row and column spans, which would otherwise be quite difficult to interpret if read row-by-row by a screenreader, is made much more accessible.

Capti web player

Another technology I saw demonstrated was the Capti web player.

Tools such as instapaper and read it later have showed that we can take most web pages and extract the body text for the article on the page.

This capability should be ideal for visually impaired users, but the tools themselves are still quite difficult to use and integrate poorly with assistive technologies. Someone described them as obviously “designed by sighted people for sighted people”.

Capti combines this capability with an accessible media player making it easy to navigate through an article, move through a list of articles, and so on. To a sighted user like me, it looked like they’ve mashed together instapaper with an audiobook-type media player. I often listen to podcasts while I go running, and am a heavy user of pocket and Safari’s reading list. So this looks ideal for me.

Multiple simultaneous audio streams

Finally, one fascinating project looked at how to make it quicker to scan large amounts of content with a screenreader to find a specific piece of information. I’ve written before that relying on a screenreader (which creates a sequential audio representation of the information on the page, starting at the beginning and going through the contents) can be tremendously time-consuming, and that it results in visually impaired users taking considerably more time to find information on the web.

This project investigated whether this could be improved by using multiple simultaneous sound sources.

It sounds mad, but they’re starting from observations such as the cocktail party effect – that in a noisy room with several conversations going on, we’re able to pick out a specific conversation that we want to listen for. Or that a student not paying attention in a lecture will hear if a lecturer says something like “this will be on the exam”.

They’re looking at a variety of approaches, such as separating the channels directionally, so one audio stream will sound like it’s coming from the left, while another is in front. Or having different voices, such as different genders, for the different streams. It’s an intriguing idea, and I’d love to see if it could be useful.


Web technologies I saw at W4A

WWW2013

Last month I went to the International World Wide Web Conference for w4a. I saw a lot of cool web technologies and accessibility projects while I was there, so thought I would share links to some of the more interesting bits.

There are too many to put in a single post, so I’ll write a few posts to cover them all.

Subtitles

Subtitles and transcripts came up a few times. One study presented looked at online video, comparing single-line subtitle captions overlaid on the video with multi-line off-screen transcripts adjacent to it.

It examined which is more effective from a variety of perspectives, including readability, reader enjoyment, the effect on understanding and so on. In summary, it found that overlaid captions are generally better, although transcripts are better for content which is more technical.

Real-time transcription from a stenographer at W4A

We had subtitles for all the talks and presentations. Impressively, a separate screen projected a live transcription of the speaker. For deaf attendees, it allowed them to follow what the speaker was saying. For talks given in Portuguese, the English subtitles allowed non-Portuguese speakers like me to understand.

They did this by having live stenographers listening to an audio feed from the talks. This is apparently expensive as stenography is a skilled expertise, and it needs to be scheduled in advance. It’s perhaps only practical for larger conferences.

Legion Scribe

This was the motivation for one of the more impressive projects that I saw presented : Legion Scribe, which crowd-sourced real-time captioning so that you wouldn’t need an expert stenographer.

Instead, a real-time audio stream is chopped up into short bits, and divided amongst a number of people using Mechanical Turk. Each worker has to type the short phrase fragment they are given. The fragments overlap, so captions that each worker types can be stitched back together to form captions for the whole original audio stream.

All of this is done quickly enough to make the captions appear more or less in real-time.

Seriously impressive.

And they’re getting reasonable levels of coverage and accuracy. The system has been designed so that workers don’t need to be experts in the domain that they’re transcribing, as they’re only asked to type in a few words at a time not whole passages. With enough people, it works. If they have at least seven workers, it’s approaching the coverage you can get with a professional stenographer.

Assuming that Mechanical Turk can provide a plentiful supply of workers, then this would not only be cheaper than a stenographer, but also let you start captioning at a moments notice, rather than needing to arrange for a stenographer in advance.

Map Reduce in the browser

Speaking of crowd-sourcing, the idea of splitting up a large computing task between a large number of volunteer computers isn’t new. SETI@home is perhaps the best known, while World Community Grid is a recent example from IBM.

But these need users to install custom client software to receive the task, perform it and submit the results.

One project showed how this could be done in web browsers. A large computing task is divided up into map reduce jobs, which are made available through a website. Each web browser that visits the website becomes a map reduce worker, running their task in the background using web workers. As long as the user remains on the site, their browser can continue to contribute to the overall task in the background, without the user having had to install custom client software.

It’s an elegant idea. Not all sites would be well suited to it, but there are plenty of web sites that I keep open all day (e.g. GMail, Remember The Milk, Google Calendar, etc.) so I think the idea has potential.

Migrating browser sessions

An interesting project I saw showed how the state of a browser app could be migrated from one browser to another, potentially a different browser running on a different machine even a different platform.

This is more than just the client-server session, which you could migrate by transferring cookies. They’re transferring the entire state of dynamic AJAX-y pages: what bits are open, enabled, and so on, for any arbitrary web app.

Essentially, they started by wanting to be able to serialize the contents of window, so that it could transferred to another browser where it could be used to restore from.

That wouldn’t be enough. window doesn’t have access to local variables in functions, it wouldn’t have access to most event listeners such as those added with addEventListener, it wouldn’t have access to the contents of some HTML5 tags like canvas, it wouldn’t have access to events scheduled with setTimeout or setInterval, and so on.

Serializing window gets you the current state of the DOM which is a good start, but not sufficient to transfer the state for most web apps.

A prototype system called Imagen shows how this could be done. Looking at how they’ve implemented it, they’ve had to resort to using a proxy server which intercepts JavaScript going to the browser and instruments it with enough additional calls to let them access all of the stuff that wouldn’t normally be in scope. This is enough for them to be able to serialize the entire state of the page.

I can see a lot of uses for this, such as in testing, debugging or service scenarios, as well as just the convenience of being able to resume work in progress as you move between devices.

Inferring constraints on REST API query parameters

Many web services include constraints and dependencies for the query parameters. For example: “this option is always required”, “that parameter is optional”, or “you have to specify at least one of this or that”. For example, the twitter API docs explain how you have to specify a user_id or screen_name when requesting a user timeline.

One project I saw was an attempt to automatically infer these rules and dependencies through a combination of natural language processing to recognise them in API documentation, and automated source code analysis of sample code provided for web services. It combines these into an estimated model of the constraints in the REST APIs, which are then verified by submitting requests to the API.

They demonstrated it on APIs like twitter, flickr, last.fm, and amazon, and it was surprisingly effective.

duolingo

Finally, there was a keynote talk on Wednesday by the founder of duolingo.

Captcha is particularly interesting because it uses a task that people need to do anyway (verify that they’re human) to crowd-source the completion of a task that needs to be done (digitise the text of old books that cannot be read by automated OCR).

Duolingo is similar. It takes a task that people need to do, which is to learn a new language, and uses that effort to translate texts into different languages.

It’s better explained by their demo video.

It’s been around for a little while, but I’d not come across it before. Since getting back from www, I’ve been trying it out. Even Grace has been using it to improve her French and seems to be getting on really well with it.

What else?

There were a lot of other cool projects and technologies that I saw, so I’ll follow this up with another post or two to share some more links.


Web technologies I saw at W4A

WWW2013

Last month I went to the International World Wide Web Conference for w4a. I saw a lot of cool web technologies and accessibility projects while I was there, so thought I would share links to some of the more interesting bits.

There are too many to put in a single post, so I’ll write a few posts to cover them all.

Subtitles

Subtitles and transcripts came up a few times. One study presented looked at online video, comparing single-line subtitle captions overlaid on the video with multi-line off-screen transcripts adjacent to it.

It examined which is more effective from a variety of perspectives, including readability, reader enjoyment, the effect on understanding and so on. In summary, it found that overlaid captions are generally better, although transcripts are better for content which is more technical.

Real-time transcription from a stenographer at W4A

We had subtitles for all the talks and presentations. Impressively, a separate screen projected a live transcription of the speaker. For deaf attendees, it allowed them to follow what the speaker was saying. For talks given in Portuguese, the English subtitles allowed non-Portuguese speakers like me to understand.

They did this by having live stenographers listening to an audio feed from the talks. This is apparently expensive as stenography is a skilled expertise, and it needs to be scheduled in advance. It’s perhaps only practical for larger conferences.

Legion Scribe

This was the motivation for one of the more impressive projects that I saw presented : Legion Scribe, which crowd-sourced real-time captioning so that you wouldn’t need an expert stenographer.

Instead, a real-time audio stream is chopped up into short bits, and divided amongst a number of people using Mechanical Turk. Each worker has to type the short phrase fragment they are given. The fragments overlap, so captions that each worker types can be stitched back together to form captions for the whole original audio stream.

All of this is done quickly enough to make the captions appear more or less in real-time.

Seriously impressive.

And they’re getting reasonable levels of coverage and accuracy. The system has been designed so that workers don’t need to be experts in the domain that they’re transcribing, as they’re only asked to type in a few words at a time not whole passages. With enough people, it works. If they have at least seven workers, it’s approaching the coverage you can get with a professional stenographer.

Assuming that Mechanical Turk can provide a plentiful supply of workers, then this would not only be cheaper than a stenographer, but also let you start captioning at a moments notice, rather than needing to arrange for a stenographer in advance.

Map Reduce in the browser

Speaking of crowd-sourcing, the idea of splitting up a large computing task between a large number of volunteer computers isn’t new. SETI@home is perhaps the best known, while World Community Grid is a recent example from IBM.

But these need users to install custom client software to receive the task, perform it and submit the results.

One project showed how this could be done in web browsers. A large computing task is divided up into map reduce jobs, which are made available through a website. Each web browser that visits the website becomes a map reduce worker, running their task in the background using web workers. As long as the user remains on the site, their browser can continue to contribute to the overall task in the background, without the user having had to install custom client software.

It’s an elegant idea. Not all sites would be well suited to it, but there are plenty of web sites that I keep open all day (e.g. GMail, Remember The Milk, Google Calendar, etc.) so I think the idea has potential.

Migrating browser sessions

An interesting project I saw showed how the state of a browser app could be migrated from one browser to another, potentially a different browser running on a different machine even a different platform.

This is more than just the client-server session, which you could migrate by transferring cookies. They’re transferring the entire state of dynamic AJAX-y pages: what bits are open, enabled, and so on, for any arbitrary web app.

Essentially, they started by wanting to be able to serialize the contents of window, so that it could transferred to another browser where it could be used to restore from.

That wouldn’t be enough. window doesn’t have access to local variables in functions, it wouldn’t have access to most event listeners such as those added with addEventListener, it wouldn’t have access to the contents of some HTML5 tags like canvas, it wouldn’t have access to events scheduled with setTimeout or setInterval, and so on.

Serializing window gets you the current state of the DOM which is a good start, but not sufficient to transfer the state for most web apps.

A prototype system called Imagen shows how this could be done. Looking at how they’ve implemented it, they’ve had to resort to using a proxy server which intercepts JavaScript going to the browser and instruments it with enough additional calls to let them access all of the stuff that wouldn’t normally be in scope. This is enough for them to be able to serialize the entire state of the page.

I can see a lot of uses for this, such as in testing, debugging or service scenarios, as well as just the convenience of being able to resume work in progress as you move between devices.

Inferring constraints on REST API query parameters

Many web services include constraints and dependencies for the query parameters. For example: “this option is always required”, “that parameter is optional”, or “you have to specify at least one of this or that”. For example, the twitter API docs explain how you have to specify a user_id or screen_name when requesting a user timeline.

One project I saw was an attempt to automatically infer these rules and dependencies through a combination of natural language processing to recognise them in API documentation, and automated source code analysis of sample code provided for web services. It combines these into an estimated model of the constraints in the REST APIs, which are then verified by submitting requests to the API.

They demonstrated it on APIs like twitter, flickr, last.fm, and amazon, and it was surprisingly effective.

duolingo

Finally, there was a keynote talk on Wednesday by the founder of duolingo.

Captcha is particularly interesting because it uses a task that people need to do anyway (verify that they’re human) to crowd-source the completion of a task that needs to be done (digitise the text of old books that cannot be read by automated OCR).

Duolingo is similar. It takes a task that people need to do, which is to learn a new language, and uses that effort to translate texts into different languages.

It’s better explained by their demo video.

It’s been around for a little while, but I’d not come across it before. Since getting back from www, I’ve been trying it out. Even Grace has been using it to improve her French and seems to be getting on really well with it.

What else?

There were a lot of other cool projects and technologies that I saw, so I’ll follow this up with another post or two to share some more links.


Everybody Technology

This afternoon I went to Everybody Technology, an event to discuss the need for technology to be inclusive and made in a way that is “so smart, so simple and so powerful it works for everybody”.

A highlight of the afternoon was Stephen Hawking – perhaps one of the best examples of the power of technology to enable someone to reach their potential. He also supported the event by lending his voice to a promotional video which explains the idea better than I can.


“Who is Technology Made For?” (YouTube)

There were several speakers. I won’t do them justice, but I did jot a few notes…

Panel discussion with Rupert Goodwins (ZDNet UK) & Damon Rose (BBC)

They talked of the stigma of using “special” equipment created especially for the blind. There were examples where even when technology or tools exist that can help, people don’t always want to use them. Maybe because they feel embarrassed, or they don’t want to be different, or even that they’re struggling with feeling forced to join a group of people they don’t feel a part of.

They discussed how it was more acceptable to use technologies when they are “standard” and how some felt more comfortable using technology that doesn’t single them out as being different.

Someone noted how people can be embarrassed wearing a hearing aid to help them hear, whilst few people would be embarrassed to wear glasses to help them see. Why are some assistive technologies more culturally acceptable than others?

There was a lot of mention of iDevices and appreciation of assistive technology being delivered as iPhone apps. To everyone else, it’s an iPhone and doesn’t stand out as being different. In addition, the fact that it’s mass-manufactured has meant that an expensive collection of advanced sensors and processing capability can be made affordable. An equivalent device produced purely as an assistive technology would be prohibitively expensive. The iPhone sparked a smartphone revolution that made this technology affordable in a way that it wasn’t before.

There was also discussion about how the app culture removed barriers between potential users and developers. Affordable sensors and technology made widely available, combined with a low-cost delivery mechanism for software innovations, make possible innovations in assistive technology that would have been impossible a few years ago.

Presentation on accessible architecture by Paul Kalkhoven

This looked at parallels between buildings and software. Disability became accepted as important in architecture and you can’t build a new building without considering accessibility. This isn’t yet true of technology.

He talked of the conflicting interests of design and utility. When designing a building, you want it to be unique and different. However, you want it to be obvious. If you want to find a toilet or fire exit, you want to understand the layout immediately. The same applies to technology: we want to make something new and exciting. But there is an expectation that it should be usable without a manual. It needs to be accessible.

One observation I hadn’t really recognised: transport buildings lead the way for accessible architecture, often abiding by a common, albeit unwritten, set of standards.

He challenged us to consider what technologists could learn from their experience.

Presentation on talking TVs by Mark Vasey (Panasonic)

Voice guidance is included as standard in most new Panasonic TVs, offering text-to-speech guidance for complex TV menus.

Perhaps more interesting was how they made it happen. He talked about challenges such as the cost of development, licensing and royalties for a feature they include “for free”. There were challenges in marketing to a minority, without wanting to classify it as a specialist product, and without making sighted users think that they were paying for a feature they didn’t need or want.

Similar to the discussion of the iPhone’s impact, he explained how the only way they could do this and make it affordable was to make it standard. Making a specialist TV with accessibility features for the visually impaired would not have been affordable. Spreading the cost across their entire product line is what made it possible.


“Introduction to Voice Guidance on Panasonic talking TVs” (YouTube)

Presentation on Threedom Phone – Antony Ribot (Ribot)

Antony gave a thought-provoking presentation about their project to make the world’s simplest smartphone.

The smartphone revolution has been great for many, but isn’t suitable for everyone. For some, the controls are too small, or too fiddly, or just too complicated. What if we made a smartphone that had only three buttons? Could we provide the essential functions that people need on a device with three large, easy to press, easy to understand, buttons?

He had an example with him and made a convincing case that there is a need for a device like this, in a market where devices are racing to get more complicated.

Everybody Technology : rlsb.org.uk/everybody

A year ago, I wrote about RLSB’s event which brought together a handful of representatives from tech companies, consumer-facing businesses, Universities, and charities for the blind. We talked about a vision of a Conversational Internet.

A year later, and RLSB got together a couple of hundred people to talk about projects that had happened – both by them, such as the Conversational Internet prototype that I presented, and by others such as Panasonic’s collaboration with RNIB to produce Voice Guidance.

They talked about what comes next, establishing a new group to bring together technologists and designers with people who understand disabilities, to make real their vision where everyone is taken into consideration.

If you think this is something you can help with, either as a developer, designer, or someone who understands a disability, then why not join them.


Conversational Internet

tl;dr

We’ve built a prototype to show how we could interact with the Internet using a command-driven approach.

  • A screen reader, but one that uses machine learning and natural language processing, in order to better understand both what the user wants to do, and what the web page says.
  • One that can offer a conversational interface instead of just reading out everything on the page.

It’s a proof-of-concept, but it’s an exciting idea with a lot of potential and we’ve got a demo that shows it in action.

The problem : screen readers today

I’ve written about this before but here is a recap.

Visually impaired people can interact with the web using screen readers. These read out every element on a page.

The user has to make a mental model of the structure of the page as it’s read out, and keep this in their head as they arrow-key around the page.

For example, on a news site’s front page, once the screen reader has read out the page, you have to remember if the story you want is the fifth or sixth story in the list so you can tab the right number of times to get to it.

Imagine an automated telephone menu:
“for blah-blah-blah, press 1, for blather-blather-blather, press 2, for something-or-other, press 3 … for something-else-vague, press 9 …”

Imagine this menu was so long it took 15 minutes or more to read.

Imagine none of the options are an exact match for what you want. But by the time you get to the end, you can’t remember whether the closest match was the third or fourth, or fiftieth option.

The vision : a Conversational Internet

Software could be smarter.

If it understood more about the web page, it could describe it at a higher, task-oriented level. It could read out the relevant bits, instead of everything.

If it understood more about what the user wants to do, the user could just say that, instead of working out the manual navigation steps themselves.

The vision is software that can interpret web pages and offer a conversational interface to web browsing.

Continue reading

Smile!

This is my mood (as identified from my facial expressions) over time while watching Never Mind the Buzzcocks.

The green areas are times where I looked happy.

This shows my mood while playing XBox Live. Badly.

The red areas are times where I looked cross.

I smile more while watching comedies than when getting shot in the head. Shocker, eh?

A couple of years ago, I played with the idea of capturing my TV viewing habits and making some visualisations from them. This is a sort of return to that idea in a way.

A webcam lives on the top of our TV, mainly for skype calls. I was thinking that when watching TV, we’re often more or less looking at the webcam. What could it capture?

What about keeping track of how much I smile while watching a comedy, as a way of measuring which comedies I find funnier?

This suggests that, overall, I might’ve found Mock the Week funnier. But, this shows my facial expressions while watching Mock the Week.

It seems that, unlike with Buzzcocks, I really enjoyed the beginning bit, then perhaps got a bit less enthusiastic after a bit.

What about The Daily Show with Jon Stewart?

I think the two neutral bits are breaks for adverts.

Or classifying facial expressions by mood and looking for the dominant mood while watching something more serious on TV?

This shows my facial expressions while catching a bit of Newsnight.

On the whole, my expression remained reasonably neutral whilst watching the news, but you can see where I visibly reacted to a few of the news items.

Or looking to see how I react to playing different games on the XBox?

This shows my facial expressions while playing Modern Warfare 3 last night.

Mostly “sad”, as I kept getting shot in the head. With occasional moments where something made me smile or laugh, presumably when something went well.

Compare that with what I looked like while playing Blur (a car racing game).

It seems that I looked a little more aggressive while driving than running around getting shot. For last night, at any rate.

Not just about watching TV

I’m using face recognition to tell my expressions apart from other people in the room. This means there is also a bunch of stuff I could look into around how my expressions change based on who else is in the room, and their expressions?

For example, looking at how much of the time I spend smiling when I’m the only one in the room, compared with when one or both of my kids are in the room.

To be fair, this isn’t a scientific comparison. There are lots of factors here – for example, when the girls are in the room, I’ll probably be doing a different activity (such as playing a game with them or reading a story) to what I would be doing when by myself (typically doing some work on my laptop, or reading). This could be showing how much I smile based on which activity I’m doing. But I thought it was a cute result, anyway.

Limitations

This isn’t sophisticated stuff.

The webcam is an old, cheap one that only has a maximum resolution of 640×480, and I’m sat at the other end of the room to it. I can’t capture fine facial detail here.

I’m not doing anything complicated with video feeds. I’m just sampling by taking photos at regular intervals. You could reasonably argue that the funniest joke in the world isn’t going to get me to sustain a broad smile for over a minute, so there is a lot being missed here.

And my y-axis is a little suspect. I’m using the percentage level of confidence that the classifier had in identifying the mood. I’m doing this on the assumption that the more confident the classifier was, the stronger or more pronounced my facial expression probably was.

Regardless of all of this, I think the idea is kind of interesting.

How does it work?

The media server under the TV runs Ubuntu, so I had a lot of options. My language-of-choice for quick hacks is Python, so I used pygame to capture stills from the webcam.

For the complicated facial stuff, I’m using web services from face.com.

They have a REST API for uploading a photo to, getting back a blob of JSON with information about faces detected in the photo. This includes a guess at the gender, a description of mood from the facial expression, whether the face is smiling, and even an estimated age (often not complimentary!).

I used a Python client library from github to build the requests, so getting this working took no time at all.

There is a face recognition REST API. You can train the system to recognise certain faces. I didn’t write any code to do this, as I don’t need to do it again, so I did this using the API sandbox on the face.com website. I gave it a dozen or so photos with my face in, which seemed to be more than enough for the system to be able to tell me apart from someone else in the room.

My monitoring code puts what it measures about me in one log, and what it measures about anyone else in a second “guest log”.

This is the result of one evening’s playing, so I’ve not really finished with this. I think there is more to do with it, but for what it’s worth, this is what I’ve come up with so far.

The script

####################################################
#  IMPORTS
####################################################

# imports for capturing a frame from the webcam
import pygame.camera
import pygame.image

# import for detecting faces in the photo
import face_client

# import for storing data
from pysqlite2 import dbapi2 as sqlite

# miscellaneous imports
from time import strftime, localtime, sleep
import os
import sys

####################################################
# CONSTANTS
####################################################

DB_FILE_PATH="/home/dale/dev/audiencemonitor/data/log.db"
FACE_COM_APIKEY="MY_API_KEY_HERE"
FACE_COM_APISECRET="MY_API_SECRET_HERE"
DALELANE_FACETAG="dalelane@dale.lane"
POLL_FREQUENCY_SECONDS=3

class AudienceMonitor():

    #
    # prepare the database where we store the results
    #
    def initialiseDB(self):
        self.connection = sqlite.connect(DB_FILE_PATH, detect_types=sqlite.PARSE_DECLTYPES|sqlite.PARSE_COLNAMES)
        cursor = self.connection.cursor()

        cursor.execute('SELECT name FROM sqlite_master WHERE type="table" AND NAME="facelog" ORDER BY name')
        if not cursor.fetchone():
            cursor.execute('CREATE TABLE facelog(ts timestamp unique default current_timestamp, isSmiling boolean, smilingConfidence int, mood text, moodConfidence int)')

        cursor.execute('SELECT name FROM sqlite_master WHERE type="table" AND NAME="guestlog" ORDER BY name')
        if not cursor.fetchone():
            cursor.execute('CREATE TABLE guestlog(ts timestamp unique default current_timestamp, isSmiling boolean, smilingConfidence int, mood text, moodConfidence int, agemin int, ageminConfidence int, agemax int, agemaxConfidence int, ageest int, ageestConfidence int, gender text, genderConfidence int)')

        self.connection.commit()

    #
    # initialise the camera
    #
    def prepareCamera(self):
        # prepare the webcam
        pygame.camera.init()
        self.camera = pygame.camera.Camera(pygame.camera.list_cameras()[0], (900, 675))
        self.camera.start()

    #
    # take a single frame and store in the path provided
    #
    def captureFrame(self, filepath):
        # save the picture
        image = self.camera.get_image()
        pygame.image.save(image, filepath)

    #
    # gets a string representing the current time to the nearest second
    #
    def getTimestampString(self):
        return strftime("%Y%m%d%H%M%S", localtime())

    #
    # get attribute from face detection response
    #
    def getFaceDetectionAttributeValue(self, face, attribute):
        value = None
        if attribute in face['attributes']:
            value = face['attributes'][attribute]['value']
        return value

    #
    # get confidence from face detection response
    #
    def getFaceDetectionAttributeConfidence(self, face, attribute):
        confidence = None
        if attribute in face['attributes']:
            confidence = face['attributes'][attribute]['confidence']
        return confidence

    #
    # detects faces in the photo at the specified path, and returns info
    #
    def faceDetection(self, photopath):
        client = face_client.FaceClient(FACE_COM_APIKEY, FACE_COM_APISECRET)
        response = client.faces_recognize(DALELANE_FACETAG, file_name=photopath)
        faces = response['photos'][0]['tags']
        for face in faces:
            userid = ""
            faceuseridinfo = face['uids']
            if len(faceuseridinfo) > 0:
                userid = faceuseridinfo[0]['uid']
            if userid == DALELANE_FACETAG:
                smiling = self.getFaceDetectionAttributeValue(face, "smiling")
                smilingConfidence = self.getFaceDetectionAttributeConfidence(face, "smiling")
                mood = self.getFaceDetectionAttributeValue(face, "mood")
                moodConfidence = self.getFaceDetectionAttributeConfidence(face, "mood")
                self.storeResults(smiling, smilingConfidence, mood, moodConfidence)
            else:
                smiling = self.getFaceDetectionAttributeValue(face, "smiling")
                smilingConfidence = self.getFaceDetectionAttributeConfidence(face, "smiling")
                mood = self.getFaceDetectionAttributeValue(face, "mood")
                moodConfidence = self.getFaceDetectionAttributeConfidence(face, "mood")
                agemin = self.getFaceDetectionAttributeValue(face, "age_min")
                ageminConfidence = self.getFaceDetectionAttributeConfidence(face, "age_min")
                agemax = self.getFaceDetectionAttributeValue(face, "age_max")
                agemaxConfidence = self.getFaceDetectionAttributeConfidence(face, "age_max")
                ageest = self.getFaceDetectionAttributeValue(face, "age_est")
                ageestConfidence = self.getFaceDetectionAttributeConfidence(face, "age_est")
                gender = self.getFaceDetectionAttributeValue(face, "gender")
                genderConfidence = self.getFaceDetectionAttributeConfidence(face, "gender")
                # if the face wasnt recognisable, it might've been me after all, so ignore
                if "tid" in face and face['recognizable'] == True:
                    self.storeGuestResults(smiling, smilingConfidence, mood, moodConfidence, agemin, ageminConfidence, agemax, agemaxConfidence, ageest, ageestConfidence, gender, genderConfidence)
                    print face['tid']

    #
    # stores face results in the DB
    #
    def storeGuestResults(self, smiling, smilingConfidence, mood, moodConfidence, agemin, ageminConfidence, agemax, agemaxConfidence, ageest, ageestConfidence, gender, genderConfidence):
        cursor = self.connection.cursor()
        cursor.execute('INSERT INTO guestlog(isSmiling, smilingConfidence, mood, moodConfidence, agemin, ageminConfidence, agemax, agemaxConfidence, ageest, ageestConfidence, gender, genderConfidence) values(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)',
                        (smiling, smilingConfidence, mood, moodConfidence, agemin, ageminConfidence, agemax, agemaxConfidence, ageest, ageestConfidence, gender, genderConfidence))
        self.connection.commit()

    #
    # stores face results in the DB
    #
    def storeResults(self, smiling, smilingConfidence, mood, moodConfidence):
        cursor = self.connection.cursor()
        cursor.execute('INSERT INTO facelog(isSmiling, smilingConfidence, mood, moodConfidence) values(?, ?, ?, ?)',
                        (smiling, smilingConfidence, mood, moodConfidence))
        self.connection.commit()

monitor = AudienceMonitor()
monitor.initialiseDB()
monitor.prepareCamera()
while True:
    photopath = "data/photo" + monitor.getTimestampString() + ".bmp"
    monitor.captureFrame(photopath)
    try:
        faceresults = monitor.faceDetection(photopath)
    except:
        print "Unexpected error:", sys.exc_info()[0]
    os.remove(photopath)
    sleep(POLL_FREQUENCY_SECONDS)

Augmented reality for Hursley mobiles

On Wednesday, Chris Book was kind enough to invite me to join the mobile developer panel at openMIC 3 : the third Mobile Innovation Camp.

The theme for the day was location and augmented reality.

A particular highlight was a talk by Paul Golding on Augmented Reality & Augmented Virtuality, covering a variety of topics such as the state of Virtual Worlds today, and the potential of mobile augmented reality apps to move us from a “Thumb Culture” to a camera-led “Third Eye culture”.

A number of mobile augmented reality platforms were discussed, such as Nokia’s MARA research project, the QR-based Insqribe, the real-world / virtual-world mobile mashup platform junaio, and the ‘world browser’ Wikitude.

Another platform that got several mentions, including a developer’s crash course in the afternoon from Richard Spence, was Layar.

I had a quiet afternoon in the office this afternoon, so I thought I’d give the Layar API a quick try for myself.

Continue reading