Researchers at Nvidia have created a new generative AI model named Fugatto which can generate ‘entirely new sounds’ from any mix of music, voices, and sounds.
The tool has been likened to a ‘Swiss Army knife’ for sound as it can allow users to control audio output by text alone.
The Fugatto model, short for Foundational Generative Audio Transformer Opus 1, can generate or transform any mix of music, voices and sounds described with prompts using any combination of text and audio files.
“This thing is wild,” said Ido Zmishlany, a multi-platinum producer and songwriter — and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups in a blog post announcing the AI model.
“Sound is my inspiration. It’s what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.”
Rafael Valle, a manager of applied audio research at the technology giant and one of the dozen-plus people behind Fugatto, said: “We wanted to create a model that understands and generates sound like humans do.
“Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale.”
AI model Fugatto can create “new sounds on the fly”
The researchers have listed numerous use cases, including music producers using Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments.
In another situation, the team says an ad agency could use the model to quickly target an existing campaign for multiple regions or situations, using different accents and emotions. Video game developers could use Fugatto to modify prerecorded assets in their titles to fit the changing action as users play the game.
The full version of the tool uses 2.5 billion parameters and was trained on a bank of Nvidia DGX systems packing 32 Nvidia H100 Tensor Core GPUs.
The team behind the new model included people worldwide, including India, Brazil, China, Jordan and South Korea. This collaboration is said to have made Fugatto’s multi-accent and multilingual capabilities stronger.
Featured Image: AI-generated via Ideogram