W4A : Future of screen readers

This is the second of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

Several of the projects that I saw showed glimpses of a possible future for screen readers.

I’ve written about screen readers before, and some of the challenges with using them.

Interactive SIGHT

One project interpreted pictures of charts or graphs and created a textual summary of the information shown in them.

I’m still amazed at this. It takes a picture of a graph, not the original raw data, and generates sensible summaries of what it shows.

For example, given this image:

It can generate:

This graphic is about United States. The graphic shows that United States at 35 thousand dollars is the third highest with respect to the dollar value of gross domestic product per capita 2001 among the countries listed. Luxembourg at 44.2 thousand dollars is the highest

or

The dollar value of gross domestic product per capita 2001 is 25 thousand dollars for Britain, which has the lowest dollar value of product per capita 2001. United States has 1.4 times more product per capita 2001 than Britain. The difference between the dollar value of gross domestic product per capita 2001 for United States and that for Britain is 10 thousand dollars.

The original version was able to process bar graphs, and was presented to W4A in 2010. What I saw was an extension that added support for line graphs.

Their focus is on the sort of graphics found in newspapers and magazines – informational, rather than scientific graphs. They want to be able to generate a high level summary, rather than a list of plot points that require the user to build a mental model in order to interpret.

For example:

The image shows a line graph. The line graph presents the number of Walmmart’s sales of leather jackets. The line graph shows a trend that changes. The changing trend consists of a rising trend from 1997 to 1999 followed by a falling trend through 2006. The first segment is the rising trend. The rising trend is steep. The rising trend has a starting value of 1890. The rising trend has an ending value of 36840. The second segment is the falling trend. The falling trend has a starting value of 36840. The falling trend has an ending value of 12606.

The image shows a line graph. The line graph presents the number of people who started smoking under the age of 18 in the US. The line graph shows a trend that changes. The changing trend consists of a rising trend from 1962 to 1966 followed by a falling trend through 1980. The first segment is the rising trend. The rising trend is steep. The second segment is the falling trend.

It’s able to interpret an image and recognise trends, recognise how noisy or smooth it is, recognise if the trend changes, and more. Impressive.

Interpreting data in tables

Another project demonstrated restructuring data tables in web pages to make them easier to explore with a screenreader.

They have an interesting approach of analysing an HTML table and reorganising it to make it more accessible, abstracting out complex sections into a series of menus.

For example, given a table such as this:

it can produce navigable menus such as this:

Even quite complex tables, with row and column spans, which would otherwise be quite difficult to interpret if read row-by-row by a screenreader, is made much more accessible.

Capti web player

Another technology I saw demonstrated was the Capti web player.

Tools such as instapaper and read it later have showed that we can take most web pages and extract the body text for the article on the page.

This capability should be ideal for visually impaired users, but the tools themselves are still quite difficult to use and integrate poorly with assistive technologies. Someone described them as obviously “designed by sighted people for sighted people”.

Capti combines this capability with an accessible media player making it easy to navigate through an article, move through a list of articles, and so on. To a sighted user like me, it looked like they’ve mashed together instapaper with an audiobook-type media player. I often listen to podcasts while I go running, and am a heavy user of pocket and Safari’s reading list. So this looks ideal for me.

Multiple simultaneous audio streams

Finally, one fascinating project looked at how to make it quicker to scan large amounts of content with a screenreader to find a specific piece of information. I’ve written before that relying on a screenreader (which creates a sequential audio representation of the information on the page, starting at the beginning and going through the contents) can be tremendously time-consuming, and that it results in visually impaired users taking considerably more time to find information on the web.

This project investigated whether this could be improved by using multiple simultaneous sound sources.

It sounds mad, but they’re starting from observations such as the cocktail party effect – that in a noisy room with several conversations going on, we’re able to pick out a specific conversation that we want to listen for. Or that a student not paying attention in a lecture will hear if a lecturer says something like “this will be on the exam”.

They’re looking at a variety of approaches, such as separating the channels directionally, so one audio stream will sound like it’s coming from the left, while another is in front. Or having different voices, such as different genders, for the different streams. It’s an intriguing idea, and I’d love to see if it could be useful.