Speaking your language

Voice-activated computer systems are finally being taken up as firms realise the potential savings.

Fans of the science fiction film 2001: A Space Odyssey will recognise the following conversation between astronaut Dave Bowman and the HAL 9000 computer.

"Open the pod bay doors, HAL," says Bowman.
"I'm sorry, Dave, I'm afraid I can't do that," comes the reply.

Sci-fi writer Arthur C Clarke managed to make many correct predictions, such as telecoms satellites. Sadly, although 2001 has been and gone, computers that can hold a rational conversation have yet to materialise.

The concept of a person talking to a computer is not new. IBM was experimenting with voice recognition as early as 1961 and had a system that could recognise 16 basic commands. But, despite progress over the past four decades, we are still not able to have a 'real' conversation with a computer.

However, voice recognition technology (VRT) has come a long way. It is now possible for spoken commands to be recognised by computers, and the technology is beginning to be adopted.

According to David Neilson, sales director at TeleWare, a provider of intelligent telephony solutions in the UK, there are two main areas where voice recognition is being used: customer fulfilment and purchasing.

"The sectors benefiting from this technology are travel, financial services and retail," he explained. "VRT provides customers with easier access to information and allows contact centre agents to devote their time to more complex transactions.

"Companies can also implement voice recognition for voice-activated directories, which allow internal and external callers to be automatically routed to the correct person, without having to speak to the switchboard operator."

False starts

Historically, the use of VRT applications has been limited to simple voice commands. Piers Mummery, managing director at Call Sciences, said: "The many false starts in applying VRT products successfully has led to a great deal of end-user suspicion and negativity that needs to be overcome.

"Products that were heralded as intelligent and easy to use were anything but, and the public's patience threshold with services that used toneless, computerised speech and failed to recognise even the most basic commands was quickly reached.

"This has set the industry back in the past. Products have been inflexible, limited, unreliable and have not enhanced the customer experience."

According to Mummery, developments over the past few years have concentrated on engines that are able to understand natural language commands. "Systems are now more intuitive, and users are able to skip, override and interrupt. This makes the process more user friendly and improves levels of customer satisfaction," he said.

Nick Applegarth, managing director for Europe, Middle East and Africa at speech recognition and voice authentication firm Nuance, believes that VRT is becoming more people friendly.

"The Nuance 8 speech recognition server can perform functions such as distinguishing the agenda of a caller, determining if someone is getting angry or getting stuck in the system and putting them through to an operator," he claimed.

"Technology that understands phrases and expressions, and is able to train VRT systems to recognise these phrases, is important. Intuitive systems such as this will undoubtedly have the biggest impact on VRT."

Semantic limitations

The problem is that spoken language is very complex and computers often struggle fully to understand human speech.

William Gray, managing director of Macfarlane Telesystems, a UK-based developer and supplier of computer telephony systems and services for interactive voice response applications in the local government sector, suggested that some of the latest developments in technology will make significant improvements.

"The key is the ability to achieve very high levels of speech recognition and very large vocabularies; up to one million words for any given utterance," he said.

"Coupled with this is the move to understand concepts and 'open grammars', where speakers have more scope to say things in a variety of ways and still be understood by the system."

One of the fundamental problems facing the widespread adoption of voice-driven systems is the sheer size, or memory footprint, of the software systems that enable them.

Although writers are now working on making the software more efficient and compact, so that it can be deployed in a much wider range of applications and devices, there is still a significant overhead associated with many systems.

If voice recognition and voice command is to be integrated into consumer devices, cars, mobile phones and other areas where a voice-activated system can be used, there needs to be an alternative to software-driven systems.

The neural net

Axeon, a designer of microprocessors, has found a possible answer with its VindAX technology. Dave Gorshkov, the company's vice president of sales, said: "VindAX is in essence a neural network in the form of a microchip.

"A neural network is a system that loosely mimics the way in which our brains work. By connecting each part of a system to the other, data can be accessed in a less linear fashion.

"Neural networks have for many years promised much, but the transition from large, unwieldy software systems has proved difficult. VindAX is a neural network on a chip which offers all the features of a software-driven system, but is small enough to fit inside any handheld device."

According to Gorshkov, his firm is focusing on the automotive industry which has been trying to bring voice commands into cars for some time.

Axeon is working with a number of companies, including Lotus Engineering, to bring voice control and other systems into our everyday lives.

"Our technology is powerful enough to allow a user to not only 'teach' the system to recognise particular words and phrases but to recognise individual accents and vocal inflections," explained Gorshkov.

"Such features are important for speech systems to be widely adopted, as they can be used to address issues such as security and to recognise particular users, or allow devices to be 'regionalised' by language or dialect, allowing manufacturers to address markets globally rather than locally."

The fact that the technology is rapidly maturing makes many people believe that there is massive growth potential in VRT.

Big business

Dr Kenton Sanmogan, head of consulting and solutions at The Speech Recognition Company, predicts a boom time ahead.

"Now that we have advanced to the point where sophisticated and useful speech-based interfaces can be deployed, the possibilities for speech-based technology are substantial," he said.

"Voice portals, voice commerce and enterprise applications represent substantial new markets that are only beginning to appear, particularly in Europe."

Sanmogan pointed to research in IDC's Worldwide Telephony Speech Processing Software Market Forecast and Analysis 2001-2005 suggesting that the worldwide market for telephony speech processing software could grow to over $3.5bn by 2005.

"Automatic speech recognition, one of three core speech technologies, will account for 96.2 per cent of the total 2005 revenue," he said. "The conversion of text into speech will account for 3.1 per cent of total revenue in 2005, while voice recognition will account for only 0.7 per cent."

There are an estimated 10,500 call centres in the UK, including about 1,300 large sites which employ more than 400,000 people.

In western Europe as a whole, the percentage of the total workforce employed in call centres is about 1.3 per cent.

UK call centre staff turnover is high and, according to consultants Blue Sky, costs UK businesses £1.1bn per year.

"There are many signs which indicate that the market is accelerating rapidly in both the US and the UK," said Sanmogan. " The Speech Recognition Company is aware of at least six major betting companies which are either already undertaking telephony speech projects or have plans to do so within the next 12 months.

"In addition, several major banking and insurance companies are experimenting with the technology or have projects underway. Most important, telephony-based speech recognition is providing firms with opportunities to reduce costs while improving customer service."

SUMMARY

CASE STUDY: Datapulse

A large London-based law practice has increased the productivity of its telephone operators by 40 per cent using a state-of-the-art voice-activated directory system from Datapulse, part of Mettoni Group.

Edwards Duthie Solicitors uses a centralised switchboard to serve 240 staff in eight London offices. The firm was created as a result of the merger of E Edwards Son and Noice and Duthie Hart and Duthie, which took effect in January 2001.

After the merger, the firm's five operators struggled to cope with call volumes of between 35,000 and 40,000 per month. Lost calls, the euphemism for calls that never get answered, reached 19 per cent, well above the industry norm.

Richard Roebuck, head of IT at Edwards Duthie, traced the problem to the volume of internal calls handled by the operators.

"External calls were getting through, so clients weren't left hanging on at the end of a phone, but we found that as much as 30 per cent of the traffic going through the switchboard was from one internal extension to another," he explained.

Operators were wasting time looking up extension numbers in paper directories, so a way had to be found to boost the efficiency of delivering internal calls without hiring extra operators.

The solution was to install screen-based operator consoles that did away with the need to scour directories, an administration system to co-ordinate updates to directory information and the Liaison voice-activated directory system from LocusDialog, one of the world's leading speech recognition specialists.

Simply by speaking a name, callers reach the desired individual or department without operator intervention.

Internal demand after the change was immediately reduced by 25 per cent, freeing operators to concentrate on more productive tasks. Roebuck said that only exceptional circumstances, such as another merger, would require an increase in operator numbers.

The new system has had a dramatic effect. Clients who already do business with the company can now reach existing contacts via the voice-activated directory without having to go through an operator or a menu-based routing system.

Support for Dialled Number Information Service and Caller Line Identification also make it possible to set up automatic personal greetings and take advantage of advanced routing options.

CONTACTS

Axeon (01224) 338 383
www.axeon.com

Call Sciences (08707) 121 212
www.callsciences.co.uk

Datapulse (0870) 442 4421
www.datapulse.com

Macfarlane Telesystems (020) 7314 1314
www.macfar.co.uk

Nuance (020) 7386 1400
www.nuance.com

Teleware (01908) 251 444
www.teleware.co.uk

The Speech Recognition Company (020) 7471 0100
www.src.co.uk