Computers and other machines are fantastic tools that allow us to become more productive, learn more information, and stay connected with each other. But in order to use them, we need to “communicate” with them in some way. Historically, this has been with the manual inputs of a mouse and keyboard (or a touchscreen), using a screen to read what the computer returns to us.
In the past decade or so, we’ve seen the gradual rise of a new way of talking to machines: voice and speech recognition. But will this mode of “talking to machines” persist into the future? And if so, how might it evolve?
The State of Technology
First, let’s take a look at the state of modern technology. People are still using keyboards, mice, and touchscreens for much of their daily interactions, but increasingly, they’re turning to voice-based interactions. We can run searches on popular search engines with a simple phrase. We can say out loud what we’d like to type, and our phones can translate that into written text. We can even install digital signs that can talk to our customers or directly engage with them.
Over the years, voice-based interactions have grown to become incredibly sophisticated. In the early days of this technology’s development, it was basically a gamble; in most cases, the system wouldn’t “hear” you correctly, or it would misinterpret what you were trying to say. But these days, the most popular digital assistants and speech recognition programs can detect and understand human speech with human-like accuracy.
In line with this, human beings have gradually become accustomed to voice-based interactions. In 2010, you might have felt foolish saying something like “OK Google,” or “Hey Alexa” to one of your devices. But in 2020, this is commonplace. In fact, it’s stranger when we see someone who doesn’t frequently interact with their machines in some way.
Why Voice Has Taken Over
Why has speech recognition seen such an impressive growth and development rate in recent years? There are a few possible explanations. The first is that voice is simply more convenient than using your hands for everything. If you’re driving a car and you want to keep your hands on the wheel while typing a message, you can simply think “out loud” and take care of it. If your fingers are sore from a long day of typing, you can switch to voice-based inputs and give your hands a break. If you’re in the living room with no device nearby and you need to know the name of the actor in the show you just watched, you can speak your query aloud and get it addressed in moments.
Voice is also low-hanging fruit when it comes to technological development. As we’ll see, there are alternative modes of machine-human communication that are much more sophisticated, and may take decades to fully develop—but we’ve practically mastered voice search in just a few years.
Consumers see the benefits, and the technology keeps getting better. So it makes sense why voice-based interactions with machines have become the new norm.
Potential Issues With Voice
That said, there are some potential issues with voice-based machine interactions, even over the long term:
- Data privacy. Every new technology brings concerns about privacy with it. Much of our voice-based search and speech recognition technology is with us at all times; we have a smartphone on our person and a smart speaker in the corner of our living room. Are these systems listening to our conversations when we don’t want them to? What kinds of data are they gathering and sending to their tech company masters?
- Misinterpretations. Even with sophisticated developments in recent years, speech recognition can fail. This is especially true when people are speaking with accents, or when they can’t articulate full thoughts for varying reasons.
- The learning curve. Accessibility may also be an issue, especially with people who struggle with speech anyway. To get the best possible results, you have to speak in a clear, direct voice and articulate each of your words precisely. This isn’t intuitive for all users.
- Background noise. High-quality speech recognition can still get muddied if there are significant levels of background noise. This means speech recognition is only ideal in certain locations and contexts; you can’t use it at a rock concert or on a construction site, for example.
- Psychological effects. We’re still in the early days of voice search, but long-term, we may find that speech-based interactions with machines have psychological consequences. For example, we may find it hard to talk to machines without feeling some kind of emotional attachment to them, or we may condition ourselves to interact with the world in different ways because of our interactions with machines.
How Voice Can Be Improved
Tech companies are continuously looking for ways they can improve their voice interactions and get an edge on the competition. These are some of the most important areas of focus:
- Accuracy. Already, speech recognition systems are at least as good as human beings, with some systems exceeding human capabilities. However, there’s still room to improve in terms of accuracy, especially when it comes to fringe cases.
- Predictive functionality. Combined with predictive analytics, voice- and speech-based interactions could become even more impressive. Machines could ask us prompting questions rather than relying on our one-way inputs, and make active suggestions about things we might need.
- Emotional context. It’s also worth considering the development of emotional context reading in digital assistants, or even mimicking human emotional content in their responses. For example, a digital assistant may be able to tell from your tone that you’re angry or afraid, and it may respond to you with a kind of technologically simulated empathy. Though the “creepy” factor may be high in this dimension, it could hypothetically lead to more natural interactions.
Alternatives to Voice
So will we ever move away from voice as a mode of interaction with machines? That remains to be seen, but there are a handful of contenders that could one day replace both speech and manual entry—even if they’re years away from full development.
- Gestures. One of the most interesting possible developments is communication with machines in the form of gestures. Rather than explicitly instructing your device what it should do, you can move your eyes in a certain pattern to call up a specific function, or you can move your fingers through the air to manipulate a holographic interface. Gestures are silent and more abstract than voice, making them simpler and more accessible in many ways. However, there may still be a steep learning curve—and the technology isn’t ready to be mainstream yet.
- Thoughts. A handful of companies are looking into the possibilities of direct brain to machine interactions; in other words, you may one day be able to control your computer with your thoughts alone, the same way you might control the movements of your arms and legs. This is a scary thought to many, since it implies the connective interaction can operate in both directions. However, this technology is still in the earliest phases, so the presence or absence of problems will be difficult to anticipate.
- Other communication methods. It’s hard to imagine what the future of machine and human communications might look like, so we can’t rule out the possibility of other, more abstract models. Some tech innovator might come up with a novel method of direct communication that we can’t even conceive of yet.
For now, voice-based controls and communications remain the dominant force in the ways we exchange information with machines. The technology is so sophisticated that most people can harness its potential easily. There are problems with its use, including privacy concerns and limited predictive abilities, but these may be mitigated (or eliminated) with further development.