Voice recognition: a sound investment

After years of empty promises, the voice recognition market is finally maturing and there may soon be a surge in demand, writes Nick Booth.

In terms of improving security and productivity, nothing can hold a candle to voice recognition technology. We could all chuck away our keyboards, for a start.

That alone would make the technology pay for itself within days. Better still, most corporate IT departments wouldn't have to spend half the week dealing with forgotten passwords, because our voices are the most effective biometric security device there is.

So that's more time and money saved. But best of all, every call you made could be patched right through. No more dimwit operators. No more 'Press one for sales, two for marketing, three if your patience has run out.'

It is a fantastic idea. That is probably why inventors rushed to market with their early efforts, arguably 15 years too early, and have promptly ruined its reputation. Now voice recognition conjures up memories of the disastrous early efforts to turn speech into text.

It is tough to make sales in this environment. This is a pity, because the technology actually works now, if you apply it correctly.

"Voice recognition is the technology of the future and it always will be," joked Simon Clark, managing director of Katapult-IT, which promotes new technology, but not voice recognition. "Seriously, though. It will arrive, but you have to limit its application."

This is a view that would be endorsed by some of the companies left in lurch by the collapse of Lernout & Hauspie. To date then, voice recognition has promised much but achieved little.

Still, you cannot create anything worthwhile until you have made a few useless systems. It could be that the technology is reliable enough now, in the right circumstances, to become a risk-free investment.

Certainly, Lloyds TSB thinks so. The bank has flouted conventional wisdom and invested in a voice-automated banking system that allows customers to phone up, talk to a computer and, if their voice is verified, be told their bank account details.

While banks like First Direct are emphasising their reliance on humans, and in their TV adverts Vic Reeves mocks the idea of using machines to access our accounts, Lloyds TSB has gone the other way.

Lloyds's gamble is that voice recognition systems are cheaper to maintain than human-operated call centres, and that they are more efficient at offering customer service.

In TV advertising parlance, Lloyds TSB says it's hot, but First Direct says not. Peter Littlewood, senior manager of interactive voice recognition (IVR) at Lloyds TSB, said: "One of the big misconceptions about high-value customers is that they want human interaction. That's rubbish. Service is about consistency, which is what voice brings to the party."

Lloyds is also confident that IVR will bring about an immediate cost saving that would pay off the price of investment.

"Businesses are run by accountants these days, and you have to make a cast iron case for return on investment," explained Andy Dennanhy, manager of Nortel Networks' IVR solutions division, whose efforts helped persuade the board that voice systems would bring about an immediate return.

So is this a watershed for Nortel's voice recognition partners? The crucial battle for sales of IVR technology has always been in the area of customer service.

So can we expect the partners of Nortel, Cisco, Kingston In Media and Telephonetics, to name but a few key vendors, to start clinching deals with any company that has a call centre?

Not just yet. It will take a lot of time to undo the damage this technology's reputation has endured thanks to years of empty promises.

"The early days of voice recognition have not helped its perception," admitted Ian Hyde, sales and marketing manager at reseller Philips Business Communication.

"People tend to associate it with all those disastrous clunky desktop systems that never worked the way they were supposed to."

Having said that, Hyde has been in talks with some of the new directory services providers with a view to selling voice recognition systems that will hook up to directory databases.

In future, perhaps, instead of dialling 192 and asking for '10 Downing Street', only to be asked 'What town is it in?', we can ask a machine for the number.

And, if Telephonetics (Hyde's technology partner) is to be believed, the machine will be a lot more helpful.

Analysts have identified this as an opportunity. "The directories market would be a viable target for IVR sales, because all these new directory companies like Yell, Telegate and even BT's Scoot will need to cut costs to compete. And one of the best ways of doing that will be to cut out the human operators," explained Evan Kirchheimer, a telecoms analyst at Datamonitor.

As with all IVR sales, however, you will have your work cut out to convince the end user that there is a return on investment. One vendor has changed its business model completely to entice people into buying voice recognition systems.

Instead of asking customers to buy the kit, install it and integrate it, Totem offers a managed service that can recognise voice commands, then turn them into commands that an IT system would understand.

The reverse process then works in passing information back out to the customer. In other words, Totem offers a managed voice interface. The Speech Recognition Company also uses this model.

"People are very wary of spending any money these days, doubly so on unproven technology like speech recognition. So we've given them the option to pay for what they use. That way they can gently introduce themselves to this way of working and it gives them time to adjust," said Phil Adams, Totem's sales and marketing manager.

He suggested that the arrangement gives resellers the chance to sell the technology without having to make an investment of time or money in equipment or training. They get an up-front commission and ongoing commission as long as the contract is in place.

"We have to make it really easy for companies today or they'll be too scared," warned Adams.

The good thing is that the technology suppliers ultimately will be able to charge more for rental agreements. The user, on the other hand, gets their money's worth from their freedom from a binding commitment to a technology project.

Voice recognition is a much more sensible way of disseminating information to customers than using touch-tone phone systems. For a start, more than two-thirds of the phones in the UK are unable to emit 'touch tones', because they predate the digital era.

Voice commands can be recognised. And the nuances of voice are so varied as to give each human voice a distinguishable ID, which surely makes this system the cheapest and most efficient biometric security device on the market.

But the poverty of our phones will also place restrictions on voice recognition usage. "The quality of some phones is such that speech recognition will never work. The phone that callers use is out of the control of the corporate implementing this type of solution," said Clark.

He added that it is not practical to allow people to phone and expect a machine to understand any command they issue. Voice recognition systems, for now, have to be limited to the number of possible options that can be considered.

"There are too many regional accents in Britain, and too many variations on the same meaning," explained Michael Nolan, an Australian who manages V Commerce's European region.

In the short journey from Coventry to Derby, for instance, you encounter more variation of accents than if you crossed Australia.

But voice recognition systems can learn to cope with regional accents. The real problem is vocabulary. What the baker call a 'batch' in Coventry, they call a 'bun' in the next town and a 'cobb' in Derby. This can make naming items to be sold somewhat difficult.

But Paul Welham, director of sales and marketing at Telephonetics, is having none of these criticisms of voice recognition. "The technology is far more robust these days. It has matured to the point that it is extremely reliable," he stated.

As proof, he invites you to phone the Telephonetics switchboard number (see below) and ask for Mr Welham, or Paul Welham, or the sales director.

If you do get through, Welham will doubtless enthuse about the opportunities being created in the public sector, in words similar to these: "The government has set targets for standards of service, which is great new for us and our partners.

"Under the regime they want to impose in the name of e-government, all calls have to be answered within in a certain number of rings and there has to be one point of contact and a limited amount of internal transfers. The most obvious way of achieving this is through voice recognition."

He predicted that in five years' time, 80 per cent of companies with their own switchboards will have a system like this in place.

Voice recognition systems, whether they are installed by Telephonetics' reseller partners (such as Philips) or rented as a service from Totem, will run in parallel with traditional systems and gradually take on more of the burden of work.

Even installing a new system is relatively easy. The Telephonetics system is a box with software installed that sits next to a traditional PBX. With a digital connection, it is possible to complete the installation within 25 minutes.

Another driver for speech recognition is the growth of the mobile user market. NOP research among the 18 to 35 age group shows that they perceive speech recognition as a much 'cooler' interface than the internet.

The web is slow and boring, apparently. The fact that 83 per cent of mobile users indicated that they prefer speech will give a massive boost to the speech recognition market.

The next phase is to enable all outgoing calls to be accessed via speech recognition. Again, there is a strong link with the mobile workforce.

Imagine you are driving with one hand on the wheel down a crowded M6 in driving rain. The other juggles a mobile phone because you need to call one of your company's suppliers. It is very dangerous, for one thing.

Rather than fumble around trying to find the right number in your contacts (the voice recognition in mobiles rarely works), wouldn't it be easier and safer to phone your own company switchboard, state the name of the contact you need, and wait for the call to be put through. You would never have to take your hands off the steering wheel.

"Think of the time that would be saved if you didn't have to search around for a number every time a supplier needed calling," explained Welham. "European legislation on the use of mobiles will encourage resellers to make more sales to companies with mobile workforces."

All technology evolves along classic stages, according to Mark Erwich, ScanSoft's regional manager for Europe. There is the early enthusiasm and hype, the early adoption, followed by a period of mass disillusion, and succeeded by a period of consolidation, the consequence of which is mature technology and mass adoption.

It is arguable, given the collapse of Lernout & Hauspie, and ScanSoft snapping up its assets at a bargain price, that we are now in the middle to late stages of consolidation.

Prepare for a surge in demand for voice recognition systems, said Erwich.

"It's ready now and we are looking for partners with a focus on telephony applications in big enterprises," he declared. "There are plenty of options. You can take a system that has been completely built, or build your own from different components."

There is a handful of vendors that provide the main platform on which voice recognition applications can be built, but they tend to regard other comms vendors, such as Cisco, as original equipment manufacturers.

Cisco or Nortel, for example, might take Nuance's technology and integrate and customise it, then pass it off as their own applications residing on their equipment.

This is fine for Nuance, as it is a channel for business, but obviously Cisco, Nortel and the rest do not have the same imperative to market voice recognition technology as strongly as Nuance.

"We need to do more to raise awareness of this market among the resellers of the big switch manufacturers," said Matt Keown, channel marketing manager at Nuance. "The challenge for us is getting the message across that speech recognition is a big opportunity."

SCANSOFT MAKES SOME NOISE

On 7 October ScanSoft agreed to acquire Royal Philips Electronics' Speech Processing Telephony and Speech Processing Voice Control units. But what does it mean for the industry?

"No longer will the two major US telephony speech engine suppliers, Nuance and SpeechWorks, be able to look down at ScanSoft as a speech recognition upstart," said Chris Hart, chief executive of the Speech Recognition Company.

"The acquisition of Philips' speech processing units propels ScanSoft into new markets. Nuance and SpeechWorks have a serious competitor on their hands."

ScanSoft bought the speech assets of Lernout & Hauspie, including RealSpeak Text-To-Speech and the Dragon NaturallySpeaking speech dictation product line.

The Dragon product competes with IBM's ViaVoice in the desktop speech recognition market, where text creation is aimed, but the text-to-speech market has been less positive for ScanSoft. Its RealSpeak product was overshadowed by Rhetorical's rVoice.

Nuance's Vocalizer and SpeechWorks' Speechify have continued to show well on the back of automatic speech recognition (ASR) engine sales by those companies.

The acquisition of Philips' ASR technology may enable ScanSoft to compete, especially in the enterprise and carrier telephony markets.

But ScanSoft still lacks speaker verification, which allows callers to be authenticated by the unique characteristics of their voice. Analysts have indicated that its next move will be another acquisition.

The facts:

SECURITY Q & A

Why is voice recognition so secure?
There are 50 different muscles used to produce a vocal response. We all use them in different intensity. Then there are attributes such as accent and resonance.

This is why the FBI is able to verify with some certainty that, for example, the voice of Osama Bin Laden is being broadcast on Al Jazeera, and not that of an impersonator.

Couldn't a talented mimic hack into my Lloyds TSB bank account?
No, according to Bob Morgen, European sales director at Nuance. It's been tested under research conditions, and no hacking attempt has ever been successful.

What if I worked in Texas for six months, picked up the accent and then tried to access my bank account? Would the system see me as an impostor?
No. The quality of your voice that is measured is shaped by factors such as your voice box, tongue, throat and even the environment in which you learned to speak and the people who taught you to speak. The accent will not make a difference.

CONTACTS:

Telephonetics (01442) 242 242
www.telephonetics.co.uk

Nuance (01483) 243 863
www.nuance.com

ScanSoft (00) 31 620 413 512
www.scansoft.com

Nortel (01628) 432 000
www.nortel.com

Totem (08000) 199 199
www.totemcommunications.com

Lloyds TSB (0117) 905 2130
www.lloydstsb.co.uk

Philips Business Communications (01223) 468 000
www.sopho.philips.co.uk

Katapult-IT (01932) 227 982
www.katapult-it.co.uk

The Speech Recognition Company (020) 7471 0100
www.src.co.uk