SPEECH - Now you're talking

Speech recognition is finally coming of age as big-name companies start to put their clout behind promoting the technology, but will it actually live up to expectations?

Voice recognition has finally crossed over from the science fictiontart to put their clout behind promoting the technology, but will it actually live up to expectations? bookshelves onto the shelves of software retailers, ISVs and integrators.

It has been a very long haul, but industry observers predict human voice interface technology is set to be extremely big - one day soon.

IT users at last appear willing to pay for the ability to automate tasks that have traditionally taken extra time and people.

In 1997, the global market for speech technology was worth less than half a billion dollars, according to research company Voice Information Associates. By 2001, that figure looks set to climb to $5 billion and by 2003, the market is projected to be worth almost $8 billion, which is a compound annual growth rate (CAGR) of 57 per cent.

One of the first buyable products to find a place in the UK was Apricot's Portable PC, launched back in 1984. It incorporated a Dragon speech recognition system that, with training and superhuman user persistence, could just about manage to execute DOS commands most of the time.

Fourteen years on, speech is still being positioned as a fledgling technology, according to David Bradshaw, analyst at market research firm Ovum. 'The ability of speech recognition software to deal with continuous speech was the breakthrough that will allow the market to take off in the next few years,' he says.

Bradshaw adds: 'As the processing power which is needed to handle speech becomes available in smaller and cheaper packages, the applications for this technology will explode. Almost anything you can do with a computer can lend itself to some element of voice control, especially where telecommunications are involved.'

Gaston Bastiaens, chief executive of speech technology manufacturer Lernout & Hauspie (L&H), also believes speech recognition has many applications.

He says: 'We use computers in our cars, at our desks, in the factory, talking on the telephone, at school and when we play - and at all these times speech can be used as the interface.'

In an allied field, translation services are set to grow from $3 billion in 1997 to $5.2 billion in 2001 and to $6.9 billion by 2003, according to Voice Information Associates. Globalisation of trade and the internet will open up a big market for computers used to translate both spoken words and text.

But IT pundits all agree that there's no point in the foreseeable future where the keyboard and mouse will disappear. They point out that the invention of the mouse did not make keyboards obsolete. Bastiaens says: 'Speech recognition may change the value of keyboard skills but will not replace them.'

Clearly, one of the developments that has made speech recognition possible has been the increasing processor power which has been available to analyse an audio stream and then attempt to make sense of it.

The power of the current generation of processors, coupled with low prices resulting from by a glut in the memory market, make an application capable of rendering continuous speech into text viable application for just about anyone who can afford to own a PC.

Speech software resellers should find that doors are beginning to open for them in the lucrative desktop upgrade market. 'Older speech recognition systems just didn't have the horsepower they needed,' says Patrick Bligh, a representative for speech technology Var TalkWrite, which resells Dragon Systems' speech technology products.

'Not only is today's software better, but the platforms that it runs on are far more capable at the price the professional work-station market is willing to pay,' says Bligh.

But he warns: 'Legacy PCs can still be a problem. A user who buys an off-the-shelf package and installs it on a 486 machine with 16Mb RAM will not have a very good experience.'

Craig Barrett, president and chief executive of Intel, is equally enthusiastic about the technology. 'We are excited by the possibilities the speech recognition applications have for business and the quality of life in general,' he says.

'The foreseeable doubling of processor power every 18 months between now and 2011 means there will be plenty of opportunity to create a world where our computing devices will listen to and act on our spoken words.' And, of course, to shift Intel chips by the boat load.

Bastiaens is confident that speech recognition technologies will eventually become universal, especially now that some of the big guns in computing have given it their seal of approval.

He believes: 'One day it will be part of the operating system. Every PC or handheld computing device will be able to take dictation and be controlled without a keyboard or mouse.'

However, the mention of the words operating system raises the spectre of Microsoft's involvement in a way that must make small technology providers such as Dragon and L&H uncomfortable - as it did with Quarterdeck, Stac and numerous others. And just because Microsoft has taken an equity interest in L&H doesn't mean it will have a free ride.

Microsoft has a double-digit share in SCO and everyone knows what it thinks about Unix.

Nevertheless, Unix is still a leading deployment platform for speech technology, especially in telephony applications. But the place where speech recognition is likely to make its biggest impact is the desktop market.

Microsoft is having enough problems getting its current product range out of the door - remember the 1997 ship dates for Windows 98 and NT 5?

- so the threat to speech integrators looks minimal in the short term.

And even when speech technology does arrive, some of today's players think the positive effects will outweigh the negative.

Bastiaens says: 'Having speech in the operating system makes the market much more aware of the technology.' But he adds: 'I'd be surprised if Microsoft shipped a professional grade product with every box, so we will take a hit when millions of users discover that it doesn't work as well as the TV adverts claim.'

Bligh is equally sanguine: 'Overall, Windows with speech technology will catalyse the market even if Microsoft ships its own speech engine or one of the ones that we currently do not support.'

Today's developers should pay attention to the Speech API that Microsoft is developing with L&H and others. An understanding of SAPI, which is now reaching adolescence, will help prevent expensive mistakes at a later date.

'At the moment, the industry is working towards modularisation and standard interfaces. That is a powerful force for creating more and better speech applications,' says Bastiaens.

All the leading-edge speech products can handle continuous speech but require a process of enrolment where the computer learns to listen to you and you, in turn, learn to talk to the computer.

Once this process is completed - this can take anything from a couple of minutes to a number of hours depending on the extent of the vocabulary required - the state of the art is better than 90 per cent accuracy. The biggest growth is expected in document generation, according to Bastiaens, and products will be revised and expanded frequently.

Dictation is today's leading application and so far, the sales focus has been on a variety of professions. L&H has been working with Vars that specialise in medical dictation.

Bastiaens says: 'It is incredibly useful for busy doctors to be able to recite their reports and findings and have them turn into text.

This used to be done by tape recorders and staff who then transcribed the dictation, but time and budget constraints opened the door for our speech applications.'

L&H offers Clinical Reporter and Voice Xpress for Medicine now and will be shipping additional healthcare oriented products in the immediate future.

While L&H is scoring big wins with the medics, TalkWrite has the legal community in its sights. 'We sell packaged software off our Website, but our real business is in providing turnkey systems to professionals - mostly lawyers but doctors, architects and others are also finding that it is worth taking the time to learn to use speech recognition tools properly,' says Bligh.

'The real opportunity isn't in shifting boxes - we don't want any part of the high street volume business. We provide complete software and hardware sales, training and support as our primary line of business.'

L&H has legal and law enforcement off-the-shelf products on offer. The latest crop of speech technology products is impressive. The recently released L&H VoiceExpress (in various levels of ability) is now available in English and US English as well as several other languages, with more on the way.

L&H's iTranslator, which made its debut at the Internet World trade show, is an illustration of how speech technology is spreading. The iTranslator family of products and services translates text documents and Websites online and allows users to search, summarise and organise information in other languages on the Web.

The product and accompanying service will create the industry's broadest array of fully customisable machine translation solutions, according to Bastiaens.

iTranslator is intended for consumer, government and professional users.

Initially, it will support 15 language pairs, including French, German, Spanish, Portuguese, Chinese, Dutch, Italian and Japanese. L&H is planning to develop an additional 20 languages, including Hindi and Farsi, as well as Slavic and Scandinavian languages. Development of these is expected to begin in the next few months and the first should be delivered in 1999.

IBM's ViaVoice has a number of packages available, from the cheap and cheerful to the thoroughly professional. IBM is especially targeting the application developer. September saw the release of the ViaVoice SDK with a VVTextBox that is similar to the Visual Basic TextBox control, plus VVUIClient, an ActiveX control and other speech recognition and voice output controls.

The IBM products provide SAPI conformance and can be used with numerous C++, Java and Visual Basic development environments. IBM supports its products via the Web, seminars and various discussion forums and is positioning the SDK as everything users need to start integrating translation technologies into their products.

Dragon's Naturally Speaking comes in three tiers according to the demands of the user. TalkWrite went with Dragon System products because, according to Bligh, 'they were the best in the market and they have maintained that lead'. TalkWrite became a Premier Reseller by taking the required training courses and making a sales commitment to Dragon.

The key feature of a package, says Bligh, is the way it makes corrections. 'Misunderstandings are inevitable but the better the package and the better trained the user, the smaller the proportion of mistakes in any document,' he explains.

'All packages will need manual correction, but if it takes more work to make corrections than to just type the words in manually, nothing is gained by using the technology.'

As the market grows, it is able to provide the necessary capital and incentive for further R&D. Advances in other computing technologies that make mips and megabytes cheap also set the stage for the rapid development of recognition, translation and control applications.

Anything that touches telephony is a prime target for speech, provided the user already has a microphone and a speaker that can be remotely connected to big enough chunks of processing power.

A handful of different voice packages are available, but the way forward for ISVs and integrators will be by forming alliances with the core technology providers and gaining experience in putting recognition and translation capabilities into their products for their existing customers.

When the day comes that the Microsoft/L&H SAPI is an operating system service instead of a bolt-on package, the well-rounded ISV will be ready.

It's hard to predict the winner out of the three top companies because the field is moving so fast. Dragon Systems may be ahead on the dictation front but the others are covering a lot of ground fast. L&H looks likely to maintain its lead in machine translation. And you can never discount the long blue arm of IBM.

If the market projections are sound, there's no reason why all three shouldn't have a place in the sun when speech technology really hits the big time.