Thursday, April 27, 2006

Speech Recognition - Week of 04/27/2006 (Publisher)

In 1997 Gartner predicted, "By 2001, speech recognition will be a standard part of the computer environment for more than 30 percent of office workers (0.7 probability)." Clearly this did not happen and in the past several months Gartner has issued reports that admit their original prediction was off. However, despite the existence of at least one major obstacle common to virtually all speech recognition applications, Gartner and other groups agree there is tremendous potential for the technology.

The recent Gartner reports identify two types of voice recognition applications. One type, known broadly as speech recognition applications, uses voice to enact commands. Speech recognition of this type is found primarily in the consumer market where it is has been used for over a decade in applications such as hands-free operation of a PCs and other digital devices, and in environments where dictation is an established practice. When Gartner made their prediction almost a decade ago, it was this type of application they were referring to, and they did so when the, seemingly, last hurdle was leaped. Back in 1997 Dragon Systems and IBM introduced "continuous speech recognition" products which allowed users to speak at natural speeds without pausing after each word. At that point the technology was good enough for Gartner to offer such a lofty prediction. Since then the technology has improved, "significantly in accuracy, usability and product functionality." Vendors are now claiming fewer errors with speech recognition than in the average person's typing and one leading vendor boasts 99 percent accuracy. Despite such credible statements Gartner now says they do not believe, "that speech recognition will become the dominant mode of text entry for at least the next decade." However, Gartner does state that, "there are several areas where speech has the potential to augment keyboard and keypad entry for a significant number of users." Two of these areas are notable.

The increase of the mobile workforce, workers who work with space and time constraints, is expected to be a key driver for speech recognition in coming years. In fact, this segment is already moving. According to Newsfactor Magazine, "Users of PDAs and smartphones are an especially attractive market for makers of speech-recognition products, because those customers already are using the devices for dictation or cell-phone calls, and many crave additional features that will spare them from typing on tiny keyboards." In this arena, top players are Ionoveo which offers a voice-based biometric lock for PDAs and Parrot Software which is making inroads into the vehicle market with hands-free phone applications for use while driving. The second key driver according to Gartner is referred to as "Application Control." "In the long run, the ability to fully control the computer (or other device) through voice, rather than simply by entering text, will be the strongest adoption driver. Voice is a natural shortcut through the complexities of menu and folder structures." Though this may be the "strongest adoption driver," it has further to go and its potential will only be realized, "when the system can respond back to users to help refine their requests through a natural language dialogue." This is a capability that has yet to be fully commercialized.

The second type of speech recognition examined by Gartner is known as speech analytics and includes applications which use voice to issue an appropriate response (this is the one all have encountered in automated telephone 'conversations'), and those that take voice data and convert it into an analyzable format. The number one segment implementing this type of application is call centers, where there is opportunity to radically improve customer service, analytics, and ultimately the bottom line. According to Tim Kraskey, vice president of Spanlink Communications, a company that develops customer-interaction tools for call centers, "Every document in your company can be tied to the IVR (Innovative Voice Response), and can be called up instantly . . . That's really powerful; it's the bleeding edge. Someone can have their call resolved accurately, quickly, and, most importantly, automatically." In the Newsfactor article mentioned above, Kevin Hegebarth, director of strategic analysis for Witness Systems, a contact-center vendor, goes a bit further, "We're really beginning to see speech recognition become intelligent enough that there's a conversational tone with a caller . . . That means we can gain insight into customer relationships instead of just routing calls where they need to go." As for the analytic side, Gartner divides their application into three categories: customer, agent, and corporate. Customer analytics provide insights into customer intent, action, and attitude; are they generally content, or are they about to switch to a competitor. Agent information can be used to see if an agent did a good job handling a call or situation. Corporate insight tends to look at the big picture such as what is generating the majority of calls. In the last instance more advanced solutions are beginning to combine their analytical insights with traditional operational data for incremental insight gains.

The scope of the term 'speech recognition' is a broad one encompassing many applications, each with unique challenges. But there are two challenges across the board which both hold a bit of irony. One of the challenges facing all in the arena is the low rate of adoption. Despite the obvious potential that the technology possesses low adoption rates are holding it in check by slowing down innovation and holding back improvements that will only come with a certain amount of standardization. However, the single greatest impediment is the user; people are still not comfortable using voice to type or operate their digital devices, all the more so in an office environment. In the end people want people on the other end of the line.

No comments: