Skip to main content

The spoken word has more power than C.S. Lewis ever imagined

Professor Roger Woods School of Electronics, Electrical Engineering and Computer Science

The spoken word has more power than C.S. Lewis ever imagined

Mobile phone key pads could soon become as antiquated as the typewriter thanks to an EPSRC-funded project designed to allow people to interact with their devices using natural continuous speech.

The technology is so sophisticated it is believed that no other currently available applications are able to compete directly with it.

Comprising computer scientists and electronic hardware designers, the research team from the School of Electronics, Electrical Engineering and Computer Science has developed a basic demonstrator to prove their innovative concept works. They have plans to set up a spinout company to exploit its significant potential, possibly by partnering with a speech recognition vendor.

Project leader Roger Woods is an internationally recognised expert in the fields of programmable system-on-chip and high-level digital signal processing design techniques. Working in collaboration with companies such as DTS, Selex and Xilinx, he has been responsible for a number of technology firsts in these areas. He is also CEO of CapnaDSP, a university spinout company he co-founded in 2008 to develop complex chip design tools.

Roger explains that while current generation phones can respond to basic voice commands, they are simply not sophisticated enough to cope with normal everyday speech.

"The conventional approach to natural language speech processing is to try to predict what someone is saying by creating a 'network' of probable words. This is an extremely large computational task that is normally done through servers.

"To do that on mobile devices requires a connection to WiFi or 3G networks and this consumes a lot of power. Together with problems with transmission delays and connection reliability issues, this drastically limits interactivity with users.

"Our approach represents a significant departure from accepted wisdom because we carry out a considerable amount of pre-processing, thereby avoiding the hassle of creating the 'network' online. All of the computation is carried out in a novel, completely self-contained processor meaning that no network connectivity is required.

"To our knowledge, this is the first time this approach has been used for low-power, large complexity speech recognition.

"We believe it is possible this technology could be embedded in millions of smartphones by 2013. Beyond that, its use could grow exponentially as it is introduced into other types of mobile devices such as tablet computers, satellite navigation systems and health care monitoring products.

"The original project idea came from two engineers working at ECIT, a £40m institute set up within the School to stimulate commercially-viable research ideas and then nurture them through to commercial development.

"EPSRC funding was vital to the project becoming a reality as it enabled us to put 16 people years' worth of effort into the programme. That represents a substantial resource commitment by any standards," says Roger.

 Click here to download a PDF version of this article.