The past months have witnessed breakthrough announcements from Microsoft, IBM, and Google, all hitting new marks in speech recognition accuracy; they claim that the error rate has reached 5.1 percent — the word error rate of humans.
Still, that doesn’t seem particularly accurate — the last time I spoke to a machine, I didn’t get the feeling that the recognition was nearly that good. Let’s review the next generation of speech technologies, how they are enabling new analytics, and the growing role of these insights for businesses to see whether the companies’ claims are accurate.
Some history of speech recognition
In early 2000, speech recognition reached 80 percent accuracy. In the enterprise space, it triggered adoption for Interactive Voice Response (IVR), which was initially implemented to remove the complications involved in customer service problems.
But speech applications were very dependent on vocabularies and languages. They required a sophisticated set-up by highly specialized system integrators, and each major language had its speech recognition startup. It was only in 2005, when Nuance snapped up 15 companies, that the space was consolidated.
See also: Amazon makes it cheaper for developers to use Alexa voice
Aside from speech usage for IVR, a second use case emerged for Quality Management (QM). Customer service organizations use quality management applications to listen to call center calls and rate them. The process used to be tedious, limited to a small sample of calls, and like looking for a needle in a haystack. With speech recognition, it became possible to automate parts of the process. Workforce optimization leaders NICE and Verint developed or bought ways into speech analytics, followed by Contact Center Infrastructure players like Avaya or Genesys.
These developments remained limited. IVR speech enablement has failed to transform the customer experience, and voice self-service experiences continue to be rated poorly. Speech for QM is often confined to compliance or script adherence verifications. At the beginning of the 2010s, it seemed that speech technology had stalled.
The machine learning transformation
While speech for customer service was developing, Amazon, Apple, Google, IBM, and Microsoft continued investing in research and development of speech technologies, driven by the vision it would eventually become critical for user interaction with machines.
Apple broke into the market with the introduction of Siri, which used machine learning to transform speech recognition. Artificial intelligence removed many of speech technology’s complications and intricacies, as well as the need to re-engineer the stack for new languages or new vocabulary sets.
Today, most digital disrupters, most notably China “Big Three,” Alibaba, Baidu, and Tencent, are building their speech stack. Because generic machine learning engines can be used, barriers to entry have been lowered dramatically. Open source options, like CMUSphinx, HTK, Julius, Kaldi, and Simon, are also widely available.
Disrupting the customer service space
The AI breakthrough has paved the way for new entrants. Companies like iFLYTEK or Speechmatics are aggressively addressing the issues of usability, accuracy, and deployability, in particular beyond the dominant languages.
For customer service, the battle is now shifting to the other half of the equation, Natural Language Processing (NLP) and Natural Language Understanding (NLU).
Yactraq is applying its patented technology to democratize audio mining. It is finding that, by democratizing the technology for businesses, it can enable businesses to innovate besides compliance and adherence, helping discover best practices in customer interaction.
Deep Learning has been powering its speech stack since 2013. It now sees NLU as the next frontier, as it can understand more than one command from different, simultaneous speakers, and prioritize them. After targeting automation and assistance for call centers, it is expanding in other industries and use cases. This industry focus is critical to finding solutions.
Omilia is another intriguing story. Originally formed as an IVR system integrator, in 2007, it started developing its own speech technology with the vision to leapfrog IVR-directed dialogs and offer natural conversations. Omilia was able to leverage deep learning to assemble its stack; its technology bolstered an impressive 59 percent reduction in IVR abandonment and a double-digit increase in self-service completion at Royal Bank of Canada.
Sales communication leading the way?
Inside selling is on the rise, and as a result, sales communication has become an active and innovative space. A growing number of sales interactions are taking place over the phone, and sales executives are concerned about becoming blind to these conversations.
Chorus.ai is a pioneer of the sales intelligence space. Cultivating a solutions approach, it has assembled a vertically-integrated stack that provides a broad range of indicators for assessing the effectiveness of conversations and correlates sales process elements to actual outcomes. It uses homegrown speech recognition, tuned and modeled for sales conversations. The company is baking its know-how in a three-step “onboarding” process, recording all conversations to uncover insights in a matter of days. These insights are then used to create dashboards tracking performance drivers, which can eventually be monitored in realtime and used to drive changes on the front lines.
Gong.io was born from one of its founders experiencing building fast growing sales teams. The founder got frustrated with inaccurate, undeveloped, superficial tools for measuring performance. He reviewed existing tools and found them too complex and ill-suited for B2B sales, which have long, unscripted conversations that can involve more than two participants. Gong.io focuses on conversation intelligence by uncovering topics and recognizing performance patterns. Founded recently, in August 2015, Gong.io assembled its first solution in record time.
The sales technology space, though recent, is incredibly dynamic: in only a few years, it evolved beyond coaching and performance management to provide broader prospect and customer insights. Sales technology is poised to become a key element of Voice of the Customer (VoC) programs.
Across the three market landscapes —Inside Sales, Interaction Management, and Intelligent Assistance and Bots — I am maintaining with VB Profiles, I have identified over 30 players:
Artificial intelligence is transforming the speech industry. Within a given domain or use case, voice recognition technology is very accurate. Usability still has not been perfected, and NLP and NLU have yet to be implemented, but speech technology has already become mature enough to stimulate the creation of new applications and markets — more startups and innovations should be on the way.