Pushing ahead in the decades-long effort to get computers to understand human speech, Google researchers have added sophisticated voice recognition technology to the company’s search software for the Apple iPhone.
Users of the free application, which Apple is expected to make available as soon as Friday through its iTunes store, can place the phone to their ear and ask virtually any question, like “Where’s the nearestStarbucks?” or “How tall is Mount Everest?” The sound is converted to a digital file and sent to Google’s servers, which try to determine the words spoken and pass them along to the Google search engine.
The search results, which may be displayed in just seconds on a fast wireless network, will at times include local information, taking advantage of iPhone features that let it determine its location.
The ability to recognize just about any phrase from any person has long been the supreme goal of artificial intelligence researchers looking for ways to make man-machine interactions more natural. Systems that can do this have recently started making their way into commercial products.
Both Yahoo and Microsoft already offer voice services for cellphones. The Microsoft Tellme service returns information in specific categories like directions, maps and movies. Yahoo’s oneSearch with Voice is more flexible but does not appear to be as accurate as Google’s offering. The Google system is far from perfect, and it can return queries that appear as gibberish. Google executives declined to estimate how often the service gets it right, but they said they believed it was easily accurate enough to be useful to people who wanted to avoid tapping out their queries on the iPhone’s touch-screen keyboard.
The service can be used to get restaurant recommendations and driving directions, look up contacts in the iPhone’s address book or just settle arguments in bars. The query “What is the best pizza restaurant in Noe Valley?” returns a list of three restaurants in that San Francisco neighborhood, each with starred reviews from Google users and links to click for phone numbers and directions.
Raj Reddy, an artificial intelligence researcher at Carnegie Mellon University who has done pioneering work in voice recognition, said Google’s advantage in this field was the ability to store and analyze vast amounts of data. “Whatever they introduce now, it will greatly increase in accuracy in three or six months,” he said.
“It’s important to understand that machine recognition will never be perfect,” Mr. Reddy added. “The question is, How close can they come to human performance?” For Google the technology is critical to its next assault on the world of advertising. Google executives said location-based queries would make it possible to charge higher rates for advertisements from nearby businesses, for example, although it is not selling such ads now.
As with other Google products the service is freely available to consumers, and the company plans to eventually make it available for phones other than the iPhone.
“We are dramatically increasing value to the advertiser through location and voice,” said Vic Gundotra, a former Microsoft executive who now heads Google’s mobile businesses.
Google is by no means the only company working toward more advanced speech recognition capabilities. So-called voice response technology is now routinely used in telephone answering systems and in other consumer services and products. These systems, however, often have trouble with the complexities of free-form language and usually offer only a limited range of responses to queries.
Several weeks ago Adobe added voice recognition technology developed by Autonomy, a British firm, to its Creative Suite software, allowing it to generate transcripts of video and audio recordings with a high degree of accuracy.
Mr. Gundotra said Google had been tackling the twin problems of entering and retrieving information with hand-held wireless devices.
“Solving those two problems in a world-class way is our goal,” he said.
The new iPhone search capability is not the first speech offering from Google. In March, it announced that GOOG-411, an experimental directory information service, had turned into a real product. The service allows users to ask for business phone and address information. The company said it had built on its experience and the data it collected through GOOG-411 in developing the iPhone service.
The new service is an example of the way Google tries to blend basic computer science research with product engineering. The company has hired many of the best speech recognition researchers in the world and now has teams working on different aspects of the problem in New York, London and its headquarters in Mountain View, Calif.
An intriguing part of the overall design of the service was contributed by a Google researcher in London, who found a way to use the iPhone accelerometer — the device that senses how the phone is held — to set the software to “listen” mode when the phone is raised to the user’s ear.
Google researchers said that another of its advantages over competitors was the billions of queries its users have made over the years.
“One thing that has changed is the amount of computation and the amount of data that is available,” said Mike Cohen, a speech research who was co-founder of Nuance Communications before coming to Google.
Past queries can be used to build a statistical model of the way words are frequently strung together, Mr. Cohen said. This is just one of the components of the speech recognition system, which also includes a sound analysis model and a mechanism for linking the basic components of language to actual words.
Google recently published a technical paper on building large models for machine translation of language. The researchers wrote that they had trained the system on two trillion “tokens,” or words.
[Source NYTIMES ]