VOICE - His master's voice

The true potential of speech recognition software has yet to be realised. 2001 director Stanley Kubrick didn't imagine the half of it.

Mention the year 2001 and what springs to mind? Screaming primatesalised. 2001 director Stanley Kubrick didn't imagine the half of it. dancing agitatedly around lumps of inert silicon? Perhaps, but then the relationship between PC users and their machines is already well documented.

Assuming that you are thinking of Stanley Kubrick's re-nowned space odyssey, then even more memorable as characterisations go is Hal, the mutinous mainframe with a voice of its own.

In depicting an era of computers able to talk and understand vocal commands, scientist Arthur C Clarke - whose notions inspired the film - was spot on. So much so that, right on cue, 2001 is being cited by one leading speech recognition outfit, Lernout & Hauspie, as the year when global revenue from the technology will top the $3.5 billion mark.

The irony is that while IBM might have been working on speech recognition systems for at least 30 years - in fact, for as long as Kubrick's film has been in existence - it is only in the past few years that theory has caught up with reality. From just a few thousand applications sold per annum back in 1996, when users could only make themselves understood by dictating in a Dalek-like fashion, today continuous speech programs are selling in their hundreds of thousands.

Much of this is down to shrink-wrapped programs such as IBM's ViaVoice and Dragon's Naturally Speaking being piled high and sold cheap. Also accelerating public take-up of the technology is the bundling of speech recognition programs with PCs, with companies such as Time and Fujitsu throwing in free copies of ViaVoice with their computers, as does IBM with its Aptiva range.

With an estimated 60 per cent share of the UK speech recognition market, ViaVoice is also included in the gold edition of the Lotus SmartSuite office set, enabling voice commands to initiate the various word processing, spreadsheet and database functions.

But while all this is good news for IBM and ISVs at the forefront of the technology, the pickings for traditional PC dealers are meagre. Off-the-shelf speech recognition packages might help persuade Joe Public to opt for an upgrade machine, but the margins on the software are on a par with any other commodity package and nothing to get overly excited about.

But tapping into the corporate market for speech recognition systems, where the value-add is in integration, is another matter. Moreover, latest text-to-speech-to-text systems look poised to spawn a whole set of industries. These range from internet service providers that want to provide instant translation services, to resellers specialising in niche areas such as medicine or law whose customers expect the systems to be au fait with their industry-specific lexicons.

But what can we expect speech recognition to be capable of in a few years' time, and where should dealers be focusing their efforts? A glimpse of where the technology is heading was recently given by Joe Lernout, co-founder of the eponymous Belgium company. Founded in 1987, and now employing 1,800 staff and notching up revenue of more than $1 billion, Lernout & Hauspie already provides a wide range of products including customised corporate systems, core speech technologies marketed to OEMs, user and retail applications in both mainstream and niche markets, and sundry translation and linguistic tools.

But the main thrust is in R&D - licensing the technologies to third parties and letting them do what they will. All of which allows others - dealers included - to get a piece of the action.

One such case is Great Malvern specialist Speech Machines, whose managing director Henry Hyde-Thomson has just clinched a deal with mobile phone operator Cellnet meaning subscribers can dictate a message down the line and have it transmitted instantly as an email, fax or letter.

Hyde-Thomson, whose previous career was in software development, echoes the view that dealers hoping to find a purely desktop niche will be disappointed. 'Speech recognition software for the PC is at a pretty low price level now,' he says, 'so much so that it is often given away.

We know people who have gone out of business because they couldn't make any money.'

By way of contrast, Speech Machines provides bespoke services to the corporate sector where the clients want the software to interact, for instance, with their own databases or a sales order entry system. Although the company licenses some of its core recognition technology from Lernout & Hauspie - paying for specialist dictionaries and intellectual property rights - it also develops its own programs in conjunction with the speech research unit of the Defence Evaluation Research Agency (Dera).

In Speech Machines' case, the objective is not selling products but providing ongoing services and support. Hyde-Thomson adds: 'We are interested in dealers whose customers have a speech recognition requirement and we run a Var reimbursement system, based on first-year revenue.'

Meanwhile, perhaps more pertinent for dealers working in the NT environment, Microsoft has taken a seven per cent stake in Lernout & Hauspie, marking its turf in speech recognition with a $60 million investment. Intel has similarly agreed to plough $30 million into the company, at the same time setting up a joint venture to make products for the burgeoning e-commerce market and for such uses as telephone information services, car navigation systems and handheld devices.

One practical upshot of the Microsoft-Lernout marriage is that the command set in the software giant's forthcoming Office 2000 suite, for instance, can be addressed verbally. This allows users to copy a field from a spreadsheet and paste it in a Word document without resorting to the keyboard. Although it's a capability already intrinsic to the gold edition of Lotus SmartSuite, the big difference is that Microsoft commands about 90 per cent of the desktop PC market. As Lernout readily admits: 'We like the partnership with Microsoft very much - it will put us on the map.'

While Lernout & Hauspie has only a fraction of the shrink-wrapped voice recognition market, due to Microsoft's and Intel's investment it has laudable ambitions to catch up. Key to this is overcoming the chore for users of acclimatising such systems to their accents. On average, it takes usually between 30 minutes and an hour before PCs are vaguely able to discern individual speech - assuming there is no background noise and that the machine has, ideally, at least 64Mb of memory, if not 128Mb.

After that and the reading of some 400 different sentences, a PC should, so the theory goes, be able to discern anyone's distinct, dulcet tones.

But, Lernout claims, the latest version of the company's Voice Express product reduces this tiresome initiation ceremony to just five minutes - and with it, the prospect of further expanding the speech recognition market.

Input, however, is only half the story. As digitisation negates the boundaries between the various multimedia formats, so it becomes possible for speech to be converted into emails, for emails to become voicemails, for voicemails to become faxes, and so forth. Throw Lernout's instant machine translation services into the fray - from Mandarin to Arabic, from English to German - and it's easy to see the plethora of opportunities opening up for any dealer with corporate connections and a reasonable understanding of NT or integration generally.

If a customer law company that wants its voicemails converted to Chinese translations and then instantly faxed to its Hong Kong associate, complete with appropriate legalese, it's no problem. What about a TV company that wants to pull out the actors' dialogue from its latest soap, irrespective of background music, and then turn it into text subtitles, perhaps in both English and Spanish?

According to Lernout, all this and more is possible: 'It's just a matter of licensing the technology and deciding how to use it. Our strategy is to license our technology to any company that wants to dial-up enable their applications so that there is natural language interface for any user, no matter what their country of residence is.'

While Lernout's speech compression technologies take care of most of the heavy translation work, the company's tools also allow licensees to write their own grammar and reasoning so specialised commands can be evolved.

Another example might be using Sun Microsystems' Jini interface so that, again in theory, a German executive on a visit to the UK could phone through to his PC in Berlin and command it to switch on his washing machine.

The next stage in the technology, Lernout says, is to erode the artificial barriers between accents: in other words, a voicemail in German sent to the UK is read back with a synthesised English accent and pronunciation.

In a few years, he predicts, Lernout will have products on the market that can take a piece of dictation from a Brit and not only convert it into Japanese speech, but phrase it so that the original British accent is included in the translation.

Products that can undertake real-time translation from one European language to another are already being licensed to third parties by the company.

These products can discern the difference between proper nouns and common nouns, or the ambiguities of syntax and context.

One area of opportunity that Lernout sees is in the machine translation of technical manuals. 'Technical manuals, by definition, are meant to be written in a non-ambiguous way and so lend themselves more readily to instant machine translation,' he argues.

Lernout also sees speech recognition as a mechanism for both improving and expanding the internet: 'Ninety per cent of what is written on the Net is in English,' he says. 'But if you can market product and services to every country in the world using native languages, the business potential is obvious. Not only that, but respondents can ask questions in their own language and you're able to give instant replies.'

Similarly, Lernout adds, internet searches can be vastly improved, with the search string pulling out corresponding terms in other languages and translating them on the fly.

Before long, he predicts, most computers will come equipped not just with microphones for speech recognition, but cameras too, so a PC can recognise its owner verbally and visually, and instinctively know when it's being spoken to or whether the remarks are being addressed elsewhere.

Cameras will also help to identify which area of a document the user is looking at, so that when a command such as 'Paste' is issued, the computer knows where to insert the piece of text or graphic in question. 'In two or three years' time, the office we have all heard or dreamed about will be a reality,' says Lernout. 'In fact, the technology already exists.'

Similarly, he adds, smart cards will come equipped with processor memories able to verify, in conjunction with ATMs, their owners by sight or sound.

'There are certain areas of your face that don't change even as you grow old,' Lernout explains, 'for instance, the distance between the exterior and interior corners of your eyes.'

But while Lernout & Hauspie is blazing a trail on the technology front, the company is not without its critics. One is Colin Howman, managing director of independent consultancy Speech Recognition, which targets the legal, surveying and medical sectors. His main complaint is that the company provides inadequate corporate support - a charge which Lernout disputes.

'Lernout might have a good retail presence in the sense you can usually find its PC products in Dixons or PC World, but when it comes to providing support for firms like us targeting corporates, we find it abysmal,' he says.

But Howman does concede Lernout's point that markets are opening up quicker than they can be addressed: 'For instance, at the moment you have a number of Vars out there which have aligned themselves to IBM, Lernout or Dragon, but there doesn't seem to be anyone that does integration across the board.'

As another example, he cites the case of stockbroker Charles Schwab, which is using speech recognition software from US specialist Nuance to allow customers to dial in and ask the latest share price of a particular company, without the need for human intervention. The software even allows callers to interrupt their own query - perhaps by explaining that what they really wanted was the midday price, not the latest.

'The Schwab system might only be built around a vocabulary of, say, 8,000 key words and company names, but that is all that's needed. It's also a good example of how the technology can be applied to specialist industries,' Howman adds.

Although Nuance uses IBM's core speech engine in its system, the real benefit is that it can be used over the telephone and is speaker-independent. Speech recognition is only just starting to take off in the corporate sector, but dealers shouldn't think that because the technology is getting easier to use that means it's any easier to implement.

'You have to invest heavily in programming and integration teams, and really understand the technology,' advises Howman. 'Otherwise you'll just drown.'

FIRST MAN, THEN MACHINE

Computer-based voice recognition systems still have a long way to go before they compare with the processing capabilities of humans. On average, it is thought the human brain contains 30 billion neurons, each blessed with 10,000 connections in a layer of tissue little more than 2mm thick - so meaningful associations can be made at lightning speed.

While the phrase 'Let us pray' is self evident to most humans, to a mere PC it might equally be interpreted as 'Let us spray' without software capable of understanding the context. But for PCs to understand context, they need more processing power and, despite the advances of speech recognition systems, human brains are still believed to offer 106 times more processing power than 350MHz PCs.

That said, it's estimated that in just 15 years from now computers will be so powerful that they will more than have surpassed humans in terms of their ability to grasp context and communicate.

IT'S GOOD TO TALK

Most speech recognition systems tend to be built around Windows NT, perhaps using the operating system as a front-end to back-end Unix or mainframe databases. But while Lernout & Hauspie arguably has an inside track on forthcoming releases of Windows software due to Microsoft chief executive Bill Gates' seven per cent stake in the company, it's an arrangement that doesn't work to the exclusion of all other ISVs.

IBM, for instance, also licenses third party use of its speech technology, including tools that support SAPI - Microsoft's speech recognition interface - so ISVs can integrate their own products with Windows applications while still using IBM's core engine. Similarly, products from Dragon and Dutch electronics giant Philips have been successfully integrated with Word, thanks to the widespread availability of SAPI tools.