Speech to Text

Apologies to the tl;dr brigade, this is going to be a long one... 

For a number of years I've been quietly working away with IBM research on our speech to text programme. That is, working with a set of algorithms that ultimately produce a system capable of listening to human speech and transcribing it into text. The concept is simple, train a system for speech to text - speech goes in, text comes out. However, the process and algorithms to do this are extremely complicated from just about every way you look at it – computationally, mathematically, operationally, evaluationally, time and cost. This is a completely separate topic and area of research from the similar sounding text to speech systems that take text (such as this blog) and read it aloud in a computerised voice.

Whenever I talk to people about it they always appear fascinated and want to know more. The same questions often come up. I'm going to address some of these here in a generic way and leaving out those that I'm unable to talk about here. I should also point out that I'm by no means a speech expert or linguist but have developed enough of an understanding to be dangerous in the subject matter and that (I hope) allows me to explain things in a way that others not familiar with the field are able to understand. I'm deliberately not linking out to the various research topics that come into play during this post as the list would become lengthy very quickly and this isn't a formal paper after all, Internet searches are your friend if you want to know more.

I didn't know IBM did that?
OK so not strictly a question but the answer is yes, we do. We happen to be pretty good at it as well. However, we typically use a company called Nuance as our preferred partner.

People have often heard of IBM's former product in this area called Via Voice for their desktop PCs which was available until the early 2000's. This sort of technology allowed a single user to speak to their computer for various different purposes and required the user to spend some time training the software before it would understand their particular voice. Today's speech software has progressed beyond this to systems that don't require any training by the user before they use it. Current systems are trained in advance in order to attempt to understand any voice.

What's required?
Assuming you have the appropriate software and the hardware required to run it on then you need three more things to build a speech to text system: audio, transcripts and a phonetic dictionary of pronunciations. This sounds quite simple but when you dig under the covers a little you realise it's much more complicated (not to mention expensive) and the devil is very much in the detail.

On the audial side you'll need a set of speech recordings. If you want to evaluate your system after it has been trained then a small sample of these should be kept to one side and not used during the training process. This set of audio used for evaluation is usually termed the held out set. It's considered cheating if you later evaluate the system using audio that was included in the training process – since the system has already “heard” this audio before it would have a higher chance of accurately reproducing it later. The creation of the held out set leads to two sets of audio files, the held out set and the majority of the audio that remains which is called the training set.

The audio can be in any format your training software is compatible with but wave files are commonly used. The quality of the audio both in terms of the digital quality (e.g. sample rate) as well as the quality of the speaker(s) and the equipment used for the recordings will have a direct bearing on the resulting accuracy of the system being trained. Simply put, the better quality you can make the input, the more accurate the output will be. This leads to another bunch of questions such as but not limited to “What quality is optimal?”, “What should I get the speakers to say?”, “How should I capture the recordings?” - all of which are research topics in their own right and for which there is no one-size-fits-all answer.

Capturing the audio is one half of the battle. The next piece in the puzzle is obtaining well transcribed textual copies of that audio. The transcripts should consist of a set of text representing what was said in the audio as well as some sort of indication of when during the audio a speaker starts speaking and when they stop. This is usually done on a sentence by sentence basis, or for each utterance as they are known. These transcripts may have a certain amount of subjectivity associated with them in terms of where the sentence boundaries are and potentially exactly what was said if the audio wasn't clear or slang terms were used. They can be formatted in a variety of different ways and there are various standard formats for this purpose from an XML DTD through to CSV.

If it has not already become clear, creating the transcription files can be quite a skilled and time consuming job. A typical industry expectation is that it takes approximately 10 man-hours for a skilled transcriber to produce 1 hour of well formatted audio transcription. This time plus the cost of collecting the audio in the first place is one of the factors making speech to text a long, hard and expensive process. This is particularly the case when put into context that most current commercial speech systems are trained on at least 2000+ hours of audio with the minimum recommended amount being somewhere in the region of 500+ hours.

Finally, a phonetic dictionary must either be obtained or produced that contains at least one pronunciation variant for each word said across the entire corpus of audio input. Even for a minimal system this will run into tens of thousands of words. There are of course, already phonetic dictionaries available such as the Oxford English Dictionary that contains a pronunciation for each word it contains. However, this would only be appropriate for one regional accent or dialect without variation. Hence, producing the dictionary can also be a long and skilled manual task.

What does the software do?
The simple answer is that it takes audio and transcript files and passes them through a set of really rather complicated mathematical algorithms to produce a model that is particular to the input received. This is the training process. Once system has been trained the model it generates can be used to take speech input and produce text output. This is the decoding process. The training process requires lots of data and is computationally expensive but the model it produces is very small and computationally much less expensive to run. Today's models are typically able to perform real-time (or faster) speech to text conversion on a single core of a modern CPU. It is the model and software surrounding the model that is the piece exposed to users of the system.

Various different steps are used during the training process to iterate through the different modelling techniques across the entire set of training audio provided to the trainer. When the process first starts the software knows nothing of the audio, there are no clever boot strapping techniques used to kick-start the system in a certain direction or pre-load it in any way. This allows the software to be entirely generic and work for all sorts of different languages and quality of material. Starting in this way is known as a flat start or context independent training. The software simply chops up the audio into regular segments to start with and then performs several iterations where these boundaries are shifted slightly to match the boundaries of the speech in the audio more closely.

The next phase is context dependent training. This phase starts to make the model a little more specific and tailored to the input being given to the trainer. The pronunciation dictionary is used to refine the model to produce an initial system that could be used to decode speech into text in its own right at this early stage. Typically, context dependent training, while an iterative process in itself, can also be run multiple times in order to hone the model still further.

Another optimisation that can be made to the model after context dependent training is to apply vocal tract length normalisation. This works on the theory that the audibility of human speech correlates to the pitch of the voice, and the pitch of the voice correlates to the vocal tract length of the speaker. Put simply, it's a theory that says men have low voices and women have high voices and if we normalise the wave form for all voices in the training material to have the same pitch (i.e. same vocal tract length) then audibility improves. To do this an estimation of the vocal tract length must first be made for each speaker in the training data such that a normalisation factor can be applied to that material and the model updated to reflect the change.

The model can be thought of as a tree although it's actually a large multi-dimensional matrix. By reducing the number of dimensions in the matrix and applying various other mathematical operations to reduce the search space the model can be further improved upon both in terms of accuracy, speed and size. This is generally done after vocal tract length normalisation has taken place.

Another tweak that can be made to improve the model is to apply what we call discriminative training. For this step the theory goes along the lines that all of the training material is decoded using the current best model produced from the previous step. This produces a set of text files. These text files can be compared with those produced by the human transcribers and given to the system as training material. The comparison can be used to inform where the model can be improved and these improvements applied to the model. It's a step that can probably be best summarised by learning from its mistakes, clever!

Finally, once the model has been completed it can be used with a decoder that knows how to understand that model to produce text given an audio input. In reality, the decoders tend to operate on two different models. The audio model for which the process of creation has just been roughly explained; and a language model. The language model is simply a description of how language is used in the specific context of the training material. It would, for example, attempt to provide insight into which words typically follow which other words via the use of what natural language processing experts call n-grams. Obtaining information to produce the language model is much easier and does not necessarily have to come entirely from the transcripts used during the training process. Any text data that is considered representative of the speech being decoded could be useful. For example, in an application targeted at decoding BBC News readers then articles from the BBC news web site would likely prove a useful addition to the language model.

How accurate is it?
This is probably the most common question about these systems and one of the most complex to answer. As with most things in the world of high technology it's not simple, so the answer is the infamous “it depends”. The short answer is that in ideal circumstances the software can perform at near human levels of accuracy which equates to in excess of 90% accuracy levels. Pretty good you'd think. It has been shown that human performance is somewhere in excess of 90% and is almost never 100% accuracy. The test for this is quite simple, you get two (or more) people to independently transcribe some speech and compare the results from each speaker, almost always there will be a disagreement about some part of the speech (if there's enough speech that is).

It's not often that ideal circumstances are present or can even realistically be achieved. Ideal would be transcribing a speaker with a similar voice and accent to those which have been trained into the model and they would speak at the right speed (not too fast and not too slowly) and they would use a directional microphone that didn't do any fancy noise cancellation, etc. What people are generally interested in is the real-world situation, something along the lines of “if I speak to my phone, will it understand me?”. This sort of real-world environment often includes background noise and a very wide variety of speakers potentially speaking into a non-optimal recording device. Even this can be a complicated answer for the purposes of accuracy. We're talking about free, conversational style, speech in this blog post and there's a huge different in recognising any and all words versus recognising a small set of command and control words for if you wanted your phone to perform a specific action. In conclusion then, we can only really speak about the art of the possible and what has been achieved before. If you want to know about accuracy for your particular situation and your particular voice on your particular device then you'd have to test it!

What words can it understand? What about slang?
The range of understanding of a speech to text system is dependent on the training material. At present, the state of the art systems are based on dictionaries of words and don't generally attempt to recognise new words for which an entry in the dictionary has not been found (although these types of systems are available separately and could be combined into a speech to text solution if necessary). So the number and range of words understood by a speech to text system is currently (and I'm generalising here) a function of the number and range of words used in the training material. It doesn't really matter what these words are, whether they're conversational and slang terms or proper dictionary terms, so long as the system was trained on those then it should be able to recognise them again during a decode.

Updates and Maintenance
For the more discerning reader, you'll have realised by now a fundamental flaw in the plan laid out thus far. Language changes over time, people use new words and the meaning of words changes within the language we use. Text-speak is one of the new kids on the block in this area. It would be extremely cumbersome to need to train an entire new model each time you wished to update your previous one in order to include some set of new language capability. The models produced are able to be modified and updated with these changes without the need to go back to a full standing start and training from scratch all over again. It's possible to take your existing model built from the set of data you had available at a particular point in time and use this to bootstrap the creation of a new model which will be enhanced with the new materials that you've gathered since training the first model. Of course, you'll want to test and compare both models to check that you have in fact enhanced performance as you were expecting. This type of maintenance and update to the model will be required to any and all of these types of systems as they're currently designed as the structure and usage of our languages evolve.

Conclusion
OK, so not necessarily a blog post that was ever designed to draw a conclusion but I wanted to wrap up by saying that this is an area of technology that is still very much in active research and development, and has been so for at least 40-50 years or more! There's a really interesting statistic I've seen in the field that says if you ask a range of people involved in this topic the answer to the question “when will speech to text become a reality” then the answer generally comes out at “in ten years time”. This question has been asked consistently over time and the answer has remained the same. It seems then, that either this is a really hard nut to crack or that our expectations of such a system move on over time. Either way, it seems there will always be something new just around the corner to advance us to the next stage of speech technologies.

Going Back to University



A couple of weeks ago I had the enormous pleasure of returning to Exeter University where I studied for my degree more years ago than seems possible.  Getting involved with the uni again has been something I've long since wanted to do in an attempt to give back something to the institution to which I owe so much having been there to get good qualifications and not least met my wife there too!  I think early on in a career it's not necessarily something I would have been particularly useful for since I was closer to the university than my working life in age, mentality and a bunch of other factors I'm sure.  However, getting a bit older makes me feel readier to provide something tangibly useful in terms of giving something back both to the university and to the current students.  I hope that having been there recently with work it's a relationship I can start to build up.

I should probably steer clear of saying exactly why we were there but there was a small team from work some of which I knew well such as @madieq and @andysc and one or two I hadn't come across before.  Our job was to work with some academic staff for a couple of days and so it was a bit of a departure from my normal work with corporate customers.  It's fantastic to see the university from the other side of the fence (i.e. not being a student) and hearing about some of the things going on there and seeing a university every bit as vibrant and ambitious as the one I left in 2000. Of course, there was the obligatory wining and dining in the evening which just went to make the experience all the more pleasurable.

I really hope to be able to talk a lot more about things we're doing with the university in the future.  Until then, I'm looking forward to going back a little more often and potentially imparting some words (of wisdom?) to some students too.

Monkigras 2013: Scaling craft

The work of William Morris, my GCSE history teacher said, was a bit of a moral dilemma. Morris was a British designer born during the Industrial Revolution. British (and then world) industry was moving rapidly towards mass production by replacing traditional, cottage-industry production processes with the more efficient, and therefore profitable, machines. One thing that suffered under this move to mass production was the focus on function and quantity over decoration and quality. Morris reacted against this by designing and producing decorations like wallpaper and textiles using the traditional craft techniques of skilled craftspeople. My history teacher’s point was that although Morris, a passionate socialist, was able to create high quality goods by using smaller-scale production methods, only wealthy people could afford to buy his designs; which was hardly equality in action. On the other hand, the skills of craftspeople were being retained, quality goods were being produced, and the craftspeople were getting paid for that quality of their work.

My pretty, handcrafted latte
My pretty, handcrafted latte

Monkigras 2013, in London last week, took on this theme of ‘scaling craft’ in the context of beer, coffee, and software. All parts of this trinity of software development can benefit hugely from a focus on quality over quantity. Before I went to Monkigras, I wasn’t really sure what to expect from a tech event advertised as having a lot of beer. It did have a lot of beer (and coffee) available but if you didn’t want it you could avoid it (several people I talked to said they didn’t usually drink beer). And no one seemed to get ridiculously drunk. And there were a lot of very cool talks.

The beer was also a fun analogy to apply to software development. Despite pubs in the UK closing hand over fist at the moment, microbreweries are on the rise. Microbrewing is about producing beer in small quantities on a commercial basis so that quality can be maintained whilst still viable as a business. One of the things we learnt from a brewer at Monkigras is that the taste of water varies according to where it comes from. Water is a major component of beer so if the taste of your water supply changes, the taste of your beer changes. To maintain the quality of the beer you brew, you must work within the natural resources available to you and not over-expand. Similarly, quality comes from skilled and knowledgeable people who need to be paid for their skill. If you take on cheaper staff and train them less so that you can make more profit, you will end up with a poorer quality product. You get the idea.

Handcrafting a wooden spoon.
Handcrafting a wooden spoon.

This principle applies to all areas of craft, whether it’s producing quality coffee, a quality wooden spoon, quality conference food, or organising a quality conference, you have to focus on quality and ensure that if you scale what you do so that it’s more readily available to more people, you don’t sacrifice quality at the same time. And, importantly, that you know when to stop. Bigger doesn’t necessarily mean better.

Software is misleadingly easy to produce. Unlike making physical objects, there is very little initial cost to producing software; you can make copies and then distribute them to customers over the Internet at very little cost. Initially, at least, it’s all in the skill of the craftspeople and their ability to identify their target users and market. If they can’t make what people will buy, they will go out of business very quickly. As software development companies get larger, the people who make the software inside the company become further removed from the selling of their software to their customers. So they become more focused on what they are close to, the technology but not who will use it.

Phil Gilbert on IBM Design Thinking
Phil Gilbert on IBM Design Thinking

Phil Gilbert, IBM’s new General Manager of Design, comes from a 30-year career in startups, most recently Lombardi, where design was core to their culture. IBM has a portfolio of 3000 software products so, when Lombardi was acquired by IBM, Phil set about simplifying the IBM Business Process Management portfolio of products, reducing 21 different products to just four and kicking off a cultural change to bring design and thinking about users to the centre of product development. Whilst praising IBM’s history of design and a recent server product design award, he also acknowledged at Monkigras: “We are rethinking everything at IBM. Our portfolio is a mess today and we need to get better”. Changing a culture like IBM’s isn’t easy but I’ve seen and experienced a big difference already. Phil’s challenge is to scale the high-quality user-focused design values of a startup to a century-old global corporation.

One of the things that struck me most at Monkigras, and appealed to me most as a social scientist, was the focus on the human side. Despite it being a developer conference, I remember seeing only one slide that contained code. The overriding theme was about people and culture, not technology; how to maintain quality by maintaining a culture that respects its craftspeople and how to retain both even if the organisation gets bigger, even if that naturally limits how much the organisation can grow. Personal analogy was also a big thing…

Laser-scanned model of the engine
Laser-scanned model of the engine

Cyndi Mitchell from Logspace talked about her family’s hog farm and working within the available resources. Shanley Kane from Basho used Dante’s spheres to describe best product management practices. Steve Citron-Pousty from RedHat use his background as an ecologist to manage communities and ‘developer ecosystems’ (don’t just call it an ecosystem; treat it like one). Diane Mueller from ActiveState talked about her 20%-time project to build a crowdsourced database of totem poles and the challenges of understanding what gets people to want to contribute to such projects. Elco Jacobs talked about his BrewPi project: automatically managing the temperature of his homebrewing fridge using a RaspberryPi based controller, and how he has open-sourced to build a community to kick start it as a potential small business. Rafe Colburn from Etsy more directly makes the link between craft and software engineering in his slides.

3D printer making a spoon
3D printer making a spoon

I don’t know much about William Morris so I don’t know which presentations he would have enjoyed or disagreed with. Morris was a preservationist and started the Society for the Protection of Ancient Buildings to ensure that old buildings get repaired and not restored to an arbitrary point in the past. So maybe he would have found laser-scanning and 3D printing interesting. Chris Thorpe is a model train geek and likes to hand-make his own models of real-life objects. He too is interested in alternatives to mass manufacturing and has started to look at how to make model kits. He uses a laser to scan the objects and a 3D printer to prototype the models. He can then send the model to a commercial company who can make it into kits for him to sell. He has recently used his laser-scanning technique to scan a rediscovered old Welsh railway engine to preserve it, virtually at least, in the state in which it was found.

I had a great time with lots of cool and fun people. Well done to @monkchips for scaling a conference to just the right level of intimacy and buzz. The last thing I saw before I left was the craftsman making a wooden spoon pitted in competition against the 3D printer making a plastic spoon.

You can find many of the slide presentations and more about the conference Lanyrd.

The post Monkigras 2013: Scaling craft appeared first on LauraCowen.co.uk.

developerWorks Days Zurich 2012

This week I had a day out of the office to go to Zurich to talk at this years IBM developerWorks Days. I had 2 sessions back to back in the mobile stream, the first an introduction to Android Development and the second on MQTT.

The slots were only 35mins long (well 45mins, but we had to leave 5 mins on each end to let people move round) so there was a limit to how much detail I could go into. With this in mind I decided the best way to give people a introduction to Android Development in that amount of time was to quickly walk through writing reasonably simple application. The application had to be at least somewhat practical, but also very simple so after a little bit of thinking about I settled on an app to download the latest image from the web comic XKCD. There are a number apps on Google Play that already do this (and a lot better) but it does show a little Activity GUI design. I got through about 95% of the app live on stage and only had to copy & paste the details for the onPostExecute method to clear the progress dialog and update the image in the last minute to get it to the point I could run it in the emulator.

Here are the slides for this session

And here is the Eclipse project for the Application I created live on stage:
http://www.hardill.me.uk/XKCD-demo-android-app.zip

The MQTT pitch was a little easier to set up, there is loads of great content on MQTT.org to use as a source and of course I remembered to include the section on the MQTT enabled mouse traps and twittering ferries from Andy Stanford-Clark.

Here are the slides for the MQTT session:

For the Demo I used the Javascript d3 topic tree viewer I blogged about last week and my Raspberry Pi running a Mosquitto broker and a little script to publish the core temperature, load and uptime values. The broker was also bridged to my home broker to show the feed from my weather centre and some other sensors.

Recent hacktivity

This time of year seems to be hacking season and over the last few days I’ve been along to two hackdays!

Friday was IBM’s internal Social Business Hackday. There was some MQTT hacking, a z/OS hack, hacks with Lotus Connections, hacks that could be the future of Lotus Connections, and I was attempting to hack a work around for a Jazz work item. And that was just at the Hursley local event! We were able to link up with a few other labs, but over two days there were IBMers hacking around the globe. There are going to be a lot of amazing projects to choose from when it comes to voting.

(There are a few more photos from HackDay X, and previous hackdays, on the IBM hackday group on flickr.)

For round two, today was the soutHACKton hack day. By the time I arrived the soldering and drilling had already begun!! Unfortunately I wasn’t able to stay long so I’m hoping there’ll me more of these in the future. I did just about have time to try out an idea I had to hack an old doorbell to sense people using the door knocker. A while ago I had accidentally created a touch sensor with a 555 timer while attempting to build another circuit. So my cunning plan was to deliberately create a 555 touch switch and connect it to the bolt on the inside of the front door. Unfortunately the best I could manage today was a two wire touch sensor, which isn’t going to work. At least not without leaving a wire hanging out of the letter box with some instructions attached! Unless someone who knows more about electronics can suggest a plan B, I may just resort to a boring doorbell button instead!!


Recent hacktivity

This time of year seems to be hacking season and over the last few days I’ve been along to two hackdays!

Friday was IBM’s internal Social Business Hackday. There was some MQTT hacking, a z/OS hack, hacks with Lotus Connections, hacks that could be the future of Lotus Connections, and I was attempting to hack a work around for a Jazz work item. And that was just at the Hursley local event! We were able to link up with a few other labs, but over two days there were IBMers hacking around the globe. There are going to be a lot of amazing projects to choose from when it comes to voting.

(There are a few more photos from HackDay X, and previous hackdays, on the IBM hackday group on flickr.)

For round two, today was the soutHACKton hack day. By the time I arrived the soldering and drilling had already begun!! Unfortunately I wasn’t able to stay long so I’m hoping there’ll me more of these in the future. I did just about have time to try out an idea I had to hack an old doorbell to sense people using the door knocker. A while ago I had accidentally created a touch sensor with a 555 timer while attempting to build another circuit. So my cunning plan was to deliberately create a 555 touch switch and connect it to the bolt on the inside of the front door. Unfortunately the best I could manage today was a two wire touch sensor, which isn’t going to work. At least not without leaving a wire hanging out of the letter box with some instructions attached! Unless someone who knows more about electronics can suggest a plan B, I may just resort to a boring doorbell button instead!!


Conversational Internet

tl;dr

We’ve built a prototype to show how we could interact with the Internet using a command-driven approach.

  • A screen reader, but one that uses machine learning and natural language processing, in order to better understand both what the user wants to do, and what the web page says.
  • One that can offer a conversational interface instead of just reading out everything on the page.

It’s a proof-of-concept, but it’s an exciting idea with a lot of potential and we’ve got a demo that shows it in action.

The problem : screen readers today

I’ve written about this before but here is a recap.

Visually impaired people can interact with the web using screen readers. These read out every element on a page.

The user has to make a mental model of the structure of the page as it’s read out, and keep this in their head as they arrow-key around the page.

For example, on a news site’s front page, once the screen reader has read out the page, you have to remember if the story you want is the fifth or sixth story in the list so you can tab the right number of times to get to it.

Imagine an automated telephone menu:
“for blah-blah-blah, press 1, for blather-blather-blather, press 2, for something-or-other, press 3 … for something-else-vague, press 9 …”

Imagine this menu was so long it took 15 minutes or more to read.

Imagine none of the options are an exact match for what you want. But by the time you get to the end, you can’t remember whether the closest match was the third or fourth, or fiftieth option.

The vision : a Conversational Internet

Software could be smarter.

If it understood more about the web page, it could describe it at a higher, task-oriented level. It could read out the relevant bits, instead of everything.

If it understood more about what the user wants to do, the user could just say that, instead of working out the manual navigation steps themselves.

The vision is software that can interpret web pages and offer a conversational interface to web browsing.

Continue reading

Failing to Invent

We IBM employees are encouraged, indeed incented, to be innovative and to invent.  This is particularly poignant for people like myself working on the leading edge of the latest technologies.  I work in IBM emerging technologies which is all about taking the latest available technology to our customers.  We do this in a number of different ways but that's a blog post in itself.  Innovation is often confused for or used interchangeably with invention but they are different, invention for IBM means patents, patenting and the patent process.  That is, if I come up with something inventive I'm very much encouraged to protect that idea using patents and there are processes and help available to allow me to do that.


This comic strip really sums up what can often happen when you investigate protecting one of your ideas with a patent.  It struck me recently while out to dinner with friends that there's nothing wrong with failing to invent as the cartoon above says Leibniz did.  It's the innovation that's important here and unlucky for Leibniz that he wasn't seen to be inventing.  It can be quite difficult to think of something sufficiently new that it is patent-worthy and this often happens to me and those I work with while trying to protect our own ideas.

The example I was drawing upon on this occasion was an idea I was discussing at work with some colleagues about a certain usage of your mobile phone [I'm being intentionally vague here].  After thinking it all through we came to the realisation that while the idea was good and the solution innovative, all the technology was already known available and assembled in the way we were proposing, but used somewhere completely different.

So, failing to invent is no bad thing.  We tried and on this particular occasion decided we could innovate but not invent.  Next time things could be the other way around but according to these definitions we shouldn't be afraid to innovate at the price of invention anyway.

BBC looking at mind control

Katia Moskvitch from the BBC has just published a nice article on using the mind to control technology.

As part of the article as well as trying out the Emotive headset* she interviewed Ed Jellard and Kevin Brown from the IBM ETS team based in Hursley.

* This is the same headset used for the Bang Goes The Theory Taxi racing.

Blue Fusion at Hursley, 2009

One of the first Hursley-related things I wrote about here on the eightbar blog back in 2006 was how much I enjoy helping with our annual schools event for National Science and Engineering Week in the UK – Blue Fusion (the event website has gone AWOL at the moment but here’s a link to the press release).

This year was no exception. This is now the fifth year that I’ve been a volunteer. Unfortunately I only had room in my schedule to spend one day helping this time around, so I choose to host a school for the day rather than spending all day on a single activity (that way, I got to see all of the different things we had on offer).

So, yesterday I had the pleasure of hosting six intelligent and polite students from Malvern St James School and their teachers – they had travelled a fair distance to come to the event, but despite the early start I think they did really well.

I won’t go into too much detail and spoil the fun for people who might read this but have not yet taken part in this week’s event, but I think we had some great activities on offer. I twittered our way through a few of them. My own personal favourite was a remote surgery activity. You can’t see much in this image (it was a dark room) but the students basically had a “body” inside a box with some remote cameras to guide their hands around and had to identify organs and remove foreign objects.

img_3774

There was also some interesting application of visual technology / tangible interfaces – a genetics exercise using LEGO bricks and a camera which identified gene strands, and an energy planning exercise which used Reactivision-style markers to identify where power stations had been placed on a map (sort of similar to what we built in SLorpedo at Hackday a couple of years ago). We also had some logic puzzles to solve, built a, err… “typhoon-proof” (ahem) tower, simulated a computer processor, and commanded a colony of ants in a battle for survival against the other school teams.

Once again, I thought this was a great event – just amazing creativity on show from the folks at Hursley in coming up with such engaging exercises. I hope the students had as much fun as I did!