In 1989, it was pure science fiction. Mere seconds after dictating cooking instructions into the oven, Marty’s aging mother pulled a fully cooked pizza out of the voice-controlled Black & Decker contraption and served her family. “Boy oh boy, mom you sure can hydrate a pizza!”
I’m referring, of course, to a scene from Back to the Future 2, but give or take a few details, it could have happened on the floor this year’s Consumer Electronics Show (CES). From phones, tablets and TVs to cars and, yes, kitchen appliances, voice-controlled computing is weaving its way into our lives. And while some of the use cases may feel a little absurd at first, talking is a very natural way for us to request things and influence our surroundings.
Artificial intelligence-fueled voice control took its biggest step toward the mainstream in 2011 with the launch of iOS 5 and Siri. As is typically the case, Apple didn’t invent this technology, but rather found a way to polish it and fit it into a package that’s easily digestible by the masses. And while Siri’s “beta” label (and much of the early fun-poking) is fully deserved, it’s easy to see where technology like this is headed and why it’s a big deal.
I recently found myself riding across Los Angeles in a friend’s car. We had wrapped up our respective workdays and were en route to meet up with his wife for dinner. “Text Priya,” he said into his phone, adding “See you in 15 minutes.” He confirmed the message and sent it, all without removing his eyes from the road. As a tech writer, I was well aware of this functionality (and much more of what Siri can do), but here it was being used effortlessly in the wild by a self-described tech novice. It was so, so… normal.
It’s about to get way more normal. Only 15 months after Siri’s arrival, voice-controlled computing is barreling ahead into the future.
Apple, Google and Nuance: Advancing Voice Control in 2013
With each subsequent update to iOS, Apple is slowly nudging Siri forward, adding deeper intelligence, tying in additional data sources and improving its overall performance. Last year, Siri leaped from the iPhone to the iPad. Mac OS X now supports voice dictation, and few doubt the endlessly-rumored about Apple-branded HDTV will change channels when asked verbally. And if you think Siri can be a smart ass now, just wait. Apple is looking for people who can help inject a lot more personality into Siri.
Meanwhile, Google is building sophisticated voice control into its Android operating system as well as its mobile apps for iOS and other platforms. Google Now is looking more and more like an impressive competitor to Siri – some say it’s already better. The technology will soon find itself baked into hyper-futuristic wearable computing accessories like Google Glass. The recently released beta of Chrome 25 supports the Web Speech API, a standard that will allow developers to build voice control directly into Web apps.
At the Consumer Electronics Show (CES) earlier this month, there was no shortage of voice-controlled gadgets for every room in the house, as well as the driveway. One of the most significant voice-related announcements came from Nuance. The company’s Wintermute project promises intelligent, cross-device voice commands, which will be consistent and personalized across TVs, smartphones, tablets and other whatever other voice-controlled devices fill your home. If you ask your phone for the football game score and then later sit down and ask your TV to turn on “the game” it will know precisely which game you mean, just as it knows your other preferences, tendencies and linguistic quirks.
The Future Of Computing With Our Voices
The cutesy personal assistants of today will evolve into what Nuance CTO Vlad Sejnoha has called “ubiquitous intelligent systems.” They will be much more nuanced, intelligent and discerning.
Sejnoha cites a number of trends that will define the future of voice control. The improved accuracy of voice recognition and smarter natural language understanding will combine with our devices’ increased awareness and ability to better discriminate between sounds to create a far more capable, intelligent system for communicating with machines.
“One area which we expect to gain more prominence is ‘ambient voice recognition,'” Sejnoha told ReadWrite. “This includes the high-accuracy transcription of natural multi-speaker conferences and other spontaneous conversations between multiple participants.”
This steady improvement of the underlying technology will coincide with the growing ubiquity of voice-controlled mobile interfaces (thanks primarily to Google and Apple). The result? Slowly but surely, voice control will become more accurate, useful and integrated into the gadgets we use – and eventually wear – everyday.
In so many contexts, using our voices to control computers just makes sense. The in-car use case is one of the most obvious. I’d also much rather tell my connected television what I want to watch and make gestures on a tablet than fidget with keyboards or wave my arms around like a maniac in my living room.
The more intelligent and integrated these systems get, the more seamless interacting with them will become. The use cases will be plentiful and far more sophisticated than even Marty McFly could have imagined.
Nice work, humanity. Now how about those hoverboards?
Lead photo by Vasile Cotovanu.