It has long been a dream of computer scientists to devise a way for users to interact with their machines simply by speaking to them.
But what was once just wishful thinking is gradually becoming reality.
Earlier this year, IBM announced the shipment of its speech recognition product Voice Type 3 for Windows 95, which it claims increases the speed and accuracy of the technology. In October, the company announced a developers toolkit which allows developers to integrate speech with their applications.
Developers wishing to distribute a run-time version of the product with their applications must obtain a licence agreement from IBM.
IBM has been experimenting with voice recognition systems for over 20 years in its laboratories. But it is not alone in the field. Philips, the Dutch electronics company, has also been working in the field, as have a number of smaller companies such as Dragon and Kurzweil. Research into speech recognition is being carried out at universities and other academic institutions in the US and UK.
Although Voice Type is a standalone software product, IBM plans to incorporate it in the next release of OS/2, codenamed Merlin. The standalone product costs #555 and existing users can upgrade to the new version for #79.
A number of factors have inhibited the adoption of speech recognition technology. One of the major drawbacks has been the fact that in the past the speech recognition software has had to be trained to an individual voice. But IBM claims it has overcome this problem and has made Voice Type speaker independent.
By speaker independent, IBM means that users do not have to train the computer to adapt to their individual voice. People can begin talking to the computer immediately, right out of the box, says the company.
Words can be dictated directly into Word, Wordperfect or Lotus Notes, where they will appear on screen.
Users were also required to speak unnaturally, leaving a pause between each word. But IBM has made some improvements to the technology. Users can now correct any mistakes at the end of the document rather than as they go along, as was the previous case and which is still the practice with other systems such as Dragon Dictate. Although IBM says editing a document for mistakes at the end of a speech session is an advantage, other vendors disagree.
Dragon Dictate requires the user to correct mistakes as they go along.
A representative of Dragon Dictate reseller Law Computer Services says: 'The feedback from our customers is that with the IBM system they do not see anything coming up on the screen and they find this off-putting. They prefer to see the words appearing on the screen and correct them as they go along.'
In the US, IBM has introduced a number of packages for specific sectors.
The medical and legal professions are seen as prime targets for Voice Type. IBM US has released Medspeak/Radiology for Windows NT, which has a vocabulary of 25,000 words and, the company claims, an average accuracy rate in beta test of 95 per cent.
In the past, radiologists have dictated patient information on to a tape for later transcription to paper. IBM claims that the system can distinguish between words that are spelt alike or sound alike from their context such as too, to and two or the difference between colon, the punctuation mark, and the medical word colon. IBM has produced specialist dictionaries for the legal and medical profession and is now examining other areas.
Apple, which prides itself on being one of the most innovative companies in the industry, has launched a speech recognition system, but not for the English language. According to an Apple representative, the company has decided to concentrate on languages, such as Chinese and Japanese, which are hard to input using a keyboard because they have a far larger number of characters than European languages.
Kurzweil has launched a cut-down version of its software. The original product, Kurzweil Voice, retails at #495; the cut-down version, Voice Pad Pro - which in essence is a wordprocessing package - is being sold for #79.
Dustin Brooks, sales and marketing manager at Talking Technology, Kurzweil's UK distributor, says demand for voice recognition products is growing steadily. 'There is huge demand,' he says. 'It used to be just disabled people that required the products, but there is growing demand from the medical and legal professions.'
Brooks admits that there are still some drawbacks to the system, such as clearly delineating between words, which he describes as 'talking like a civilised Dalek'. But he argues that users soon get used to this unnatural method of speaking.
Philips has built its products based on the company's experience with dictation systems. The company claims that it has a true continuous speech product available, but it is not a cheap option. Philips aims to sell its software on the back of specific applications, but the cost of the speech recognition products will add a further #2,000 per user to the cost of the application software and the hardware.
'We are not producing a shrinkwrapped product, unlike other vendors,' says Graham Young, business development manager at the company's speech recognition division. Philips has gone for a client/ server approach where the software resides on the server rather than on an individual desktop machine.
Despite IBM's claims that its latest products are speaker independent, some individual voice training has to be undertaken. Duncan Ross, speech business manager for IBM, admits there are some differences in accents that the system needs to be able to recognise. He uses the example of the word 'bath', whose pronunciation varies according to whether the speaker is from the North or the South.
According to Ross, one of the reasons why the radiology package that IBM released in the US was not launched in the UK is that it was developed for US accents and terminology. Traditionally, a speech recognition user goes through a process known as enrolment, which means training a machine to recognise the user's voice structure. Ross claims the new IBM system reduces the enrolment process to a minimum. 'Eighty-five per cent of people using the system have been achieving 95 per cent accuracy and never needed enrolment,' he says.
Ross highlights the fundamental design differences between products which produce the words on the screen as they are spoken and Voice Type, which does not.
'The whole point is that you do not need to watch the screen,' he maintains.
He argues that having to watch the screen and correct mistakes as the speaker goes along is less productive than having the system play back the words after a document has been completed and then making any corrections that are necessary.
Most speech recognition products are aimed at professional businessman who are averse to typing their own documents but still need to communicate in writing. The legal and medical professions together with insurance, financial institutions and the property business are the typical targets for the software vendors.
Ross argues that there are considerable savings to be made by cutting back on expensive specialist typing staff, such as legal and medical secretaries.
There is also a burgeoning home market where a cut-down version of most of the suppliers' software is available.
Robin Bloor, chairman of consultancy Bloor Research, has experimented with the IBM system. 'The first thing you need to remember is that it takes about a 100Mb of disk space,' he says.
'Second, if you can type fast, the system is not much use to you. But if you are someone who pecks at a keyboard then you are going to go with it. One of the drawbacks is that the voice has not been integrated with the user interface, but I am sure that will come.'
Overall, Bloor is impressed with Voice Type. 'I am sure that Dragon and the other systems are equally good,' he says. What he finds most irritating about the system is having to wear a headset for the microphone. 'There are a lot of applications out there which could use voice systems. Disabled people, such as the blind, will find it useful and there are some situations, such as the factory floor, where having a keyboard is not such a good idea,' he says.
'One of our consultants has already discovered that if he is going to or from a meeting in his car and needs to make notes then the voice system comes in very useful.'
Although no longer an embryonic technology, speech recognition is still scarcely out of its nappies and is at the crawling stage rather than being a toddler. There is no doubt that one of the things that has driven the technology forward is the increasing processing power of the PC.
There are strong rumours that Microsoft, like IBM, is working on projects to incorporate speech recognition into future versions of its operating systems. If and when that happens, third-party suppliers like Dragon, Philips and Kurzweil, which have done so much to pioneer the technology, could find themselves without a market.
But there are still many obstacles to be overcome before true continuous speech recognition becomes a reality at a price that is affordable to the average user.
Experiments with alternative systems of inputting data to the PC have been around for many years.
Apple and other companies have attempted to introduce pen-based systems where a stylus rather than the keyboard is the method of data entry. But, as with speech recognition, pen-based systems have been bedevilled with problems, notably the failure of the systems to recognise joined up handwriting.
Both speech recognition systems and pen-based computing will eventually be a reality, but it is still too early to write off the keyboard as the main way of entering data into a computer.
Today saw 14 of the UK IT channel's biggest hitters come together to determine the winners of CRN's WiC awards. But what does being a WiC judge actually involve? Doug Woodburn reports
'Smaller firms may struggle to keep up with Microsoft's innovation with Dynamics' says CEO Stuart Fenton after acquiring assets from Profile Enterprise Solutions
Pete Peterson admits the firm hasn't always been the 'easiest company to do business with'
New chief exec Aaron Painter says 'longer-term strategy' could see firm tackle the Asian market