Across the board, users want a piece of the pie when it comes to AI. It’s hardly surprising that there has been an influx of creative ways to test its abilities in the form of generators. Whether it’s music makers like Suno or video creators such as Sora, there are now a multitude of ways to play around with these new technologies. The next iteration of these gadgets includes voice generators, which can assist with tasks such as text-to-speech and voice cloning.
What are AI voice generators and how do they work?
AI voice generator software transforms written text into voices that closely resemble human speech. It can be customized for various speech styles, ages, genders, and accents, and can also translate text into multiple languages. An increasing number of people are using this technology to narrate YouTube videos, podcasts, and video games. There have even been reports of it being used to narrate audiobooks.
These generators rely on deep learning algorithms, which are a branch of artificial intelligence that improves through analyzing large volumes of data. The way it works involves first training on a large dataset of voice recordings. Through this training, the algorithms learn to recognize speech patterns, such as intonation, rhythm, and accents, from these recordings. The quality and variety of the data used to train the generator influence how well it can create different and precise voices.
After the training phase, the AI uses text-to-speech (TTS) technology to convert written text into spoken words. This process starts with the AI breaking down the input text into its phonetic elements, and then synthesizing these components to construct complete words and sentences.
To make it more realistic, some sophisticated AI voice generators integrate Natural Language Processing (NLP) techniques. NLP enables the AI to grasp and process the subtleties of human language, allowing it to adjust its output for linguistic nuances such as sarcasm, questions, or excitement. This makes the synthesized speech sound more natural and human. It’s expected to improve as these technologies evolve.
What are the best AI voice generators?
Using the pangram – a sentence that contains all the letters of the alphabet – we tested out the different AI voice generators out there:
“The quick brown fox jumps over the lazy dog.”
ElevenLabs
ElevenLabs is one of the most notable firms in this area of AI. Its free online software provides users access to 27 different voice options, as well as the ability to translate into 29 different languages, including Chinese, Hindi, and Russian. The software is free and users are able to download on the free version. Users should be cautious when translating from English to other languages, as the translations are not always accurate and can significantly alter the intended meaning.
The maximum number of characters that can be generated in a single request on the platform is 2,500 for users who are not subscribed and 5,000 for those who are subscribed. There are also five tiers, including the free membership, with prices ranging from $1 to $330 per month, offering between 10 minutes and 40 hours of audio. The audio quality varies across the different packages, as does the ability to distribute commercially.
The UK-based company ElevenLabs got unicorn status in January 2024 after securing an $80 million Series B funding round, making it a serious player in the AI voice generation game. It also announced that it would be launching AI sound effects.
Mati Staniszewski, CEO and co-founder of ElevenLabs, said their goal is “to transform how we interact with content by breaking down language and communication barriers.” He added that the London-based voice cloning company hopes to build cutting-edge technology to make content accessible across languages and voices “to enable everyone to connect with information and stories that matter.”
The company has faced backlash in the past after it was blamed for deepfake robocalls of Joe Biden to New Hampshire voters.
VEED.IO
VEED.IO is generally known as video editing software – it’s even named after it. However, it has recently introduced realistic text-to-speech AI voiceovers as well. Users can choose from a wide range of AI voices in multiple languages, but they must sign up for the service on a free plan. Unlike ElevenLabs, there are discrepancies when emphasizing certain words within sentences. Currently, up to 1,000 characters can be added per video project. Users can also translate their text into 60 different languages.
While there is a free option, the products come with watermarks. The paid tiers are for its video component, which ranges from £10 to £49 per month billed annually. The audio part of the software is free.
On their blog, VEED vice president of marketing Leila Woodington said: “The less time you have to spend on the routine parts of production, the more time you have to think about the storytelling and the craft.”
Murf.AI
Murf.AI offers 10 minutes in its free trial, providing access to over 120 voices in its studio. Theoretically, depending on the selected voice, it allows users to alter the mood of the voice to include angry, conversational, inspirational, and sad tones. The availability of UK regional accents was particularly exciting to see. However, while the voice sounds somewhat robotic, the accents on certain words are accurate. Users are not able to download the recordings for free.
A cool feature offered by Murf, which isn’t provided by any other text-to-speech converter, is that it allows users to change their voice while recording. The voiceovers can be personalized based on pitch, speed, and volume. It even offers a tool to create Spotify ads.
It offers three tiers, including its free plan, with prices ranging from $23 to $79 per month when billed annually. Only the most expensive membership allows people to change their voices and integrate their works with Google Slides. However, both paid plans permit users to utilize their recordings for commercial purposes.
PlayHT
Like VEED.IO and Murf.AI, people have to sign up for PlayHT. What’s interesting about PlayHT is that each sample is unique and can be downloaded. The recording sounds fairly natural, though a little morose, and the software provides around 12,400 free characters.
It also has a voice cloning feature, integrations into WordPress, as well as custom pronunciations. However, this is not available on the free tier. The two paid plans are both billed yearly and are $31.20 and $99.
A YouTuber was reported to have used PlayHT to modify the AI-generated voice on a Pokédex to make it have the sound and cadence of the actual device in the show.
LOVO
LOVO also requires registering and paying for its service before recordings can be downloaded, however, users can test out 180 characters without signing up. One of Lovo Studio’s standout features is its ability to generate natural-sounding voices in various languages. Whether users need English voiceovers or voices in different languages, LOVO Studio’s AI technology delivers voices that are remarkably human-like and emulate human speech effectively.
LOVO Studio provides a range of plans catering to different needs, starting with a free plan providing basic functionality. This allows users to explore the platform and its capabilities without any cost. The Pro plan is available for $48 per month for those seeking more features and customization options. The platform also offers premium voices for users looking for even higher quality and more distinct options, for $75 per month billed annually.
Featured image: DALL.E / Canva