Home Researchers reveal TANGO, an AI system that generates realistic human speakers

Researchers reveal TANGO, an AI system that generates realistic human speakers

TLDR

  • Researchers launched TANGO, an AI that generates realistic talking videos with synchronized gestures.
  • The tool addresses audio-motion misalignment and creates high-fidelity videos from a reference video.
  • TANGO aims to expand capabilities for dance and sports, contributing to synthetic media creation discussions.

Researchers have unveiled a new AI system called TANGO which can generate realistic full-body talking videos of people, showing how far synthetic media creation has come.

A series of videos have been created to showcase the tool, published on its website and YouTube, including showing how movements can be faked through technology to match any audio recording.

In one example, 10 separate videos of different individuals can be seen repeating the same script – all with expressive hand movements that look natural.

The team behind the AI tool has added it to the community-focused Hugging Face, where people can try it out for themselves using nine demo videos.

“Given a few-minute, single-speaker reference video and target speech audio, TANGO produces high-fidelity videos with synchronized body gestures,” writes the researchers in a paper that was submitted on October 5.

How does TANGO AI work to generate realistic faux clips?

TANGO builds on Gesture Video Reenactment which splits and retrieves video clips using a graph structure.

Two limitations have then been addressed and solved which include audio-motion misalignment and virtual artifacts in GAN-generated transition frames.

To ensure TANGO operates smoothly, the team has retrieved relevant gestures using latent feature distance to improve cross-modal alignment. The relationship between speech audio and gesture motion was then built upon so realistic, audio-synchronized videos could be created.

The researchers believe this is the first work that has been created to present “CLIP-Like contrastive learning on audio and motion modalities, and it is the first open-source motion graph and audio-driven video generation pipeline.”

The team hopes to extend TANGO’s abilities in the future, so it can include dance, sports, and more.

The AI project comes amidst growing discourse about the use of AI in video content creation, as several video editing software now include some form of generative AI.

YouTube, arguably the most popular video-focused platform, introduced a disclosure tool in the Creator Studio back in March of this year.

Through this, creators are asked to disclose if their ‘realistic content’ has been made with altered or synthetic media, including generative AI.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the gambling and blockchain industries for major developments, new product and brand launches, game releases and other newsworthy events. Editors assign relevant stories to in-house staff writers with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Sophie Atkinson
Freelance Journalist

Sophie Atkinson is a UK-based journalist and content writer, as well as a founder of a content agency which focuses on storytelling through social media marketing. She kicked off her career with a Print Futures Award which champions young talent working in print, paper and publishing. Heading straight into a regional newsroom, after graduating with a BA (Hons) degree in Journalism, Sophie started by working for Reach PLC. Now, with five years experience in journalism and many more in content marketing, Sophie works as a freelance writer and marketer. Her areas of specialty span a wide range, including technology, business,…