Amazon Web Services (AWS) recently announced major expansions to Amazon Transcribe, its cloud-based automatic speech recognition service, enabling transcription across over 100 languages. The new capabilities leverage generative AI models that have been trained on millions of hours of speech data — according to an Amazon blog post.
Previously, Amazon Transcribe supported 79 languages with 20-50% accuracy rates. The new self-supervised algorithms powering the transcription service can now recognize unique speech patterns and accents across a diverse range of languages. This prevents the overrepresentation of particular languages in the training data, ensuring accuracy is consistent regardless of how widely used a language is.
AI transcription automation will widen the ability of large populations to have words in their own language.
The AI advancements significantly widen the accessibility of automatic transcription, which was previously only available for common languages like English and Spanish. AWS customers can now leverage the service around the world, building applications that require speech-to-text capabilities.
Features such as automatic punctuation, custom vocabulary, language identification, and content filtering provide additional usability for translating both audio and video recordings. The transcriptions can reportedly understand speech even in noisy environments, making the technology well-suited for summarizing call center interactions.
AWS’s Call Analytics platform already utilizes Amazon Transcribe to generate automated summaries of agent-customer call transcripts. This reduces the manual effort needed to interpret calls and extract meaningful insights. Industry experts believe that as speech recognition accuracy continues to improve, integration of such AI services will accelerate across various business applications.
Amazon Transcribe is still a large player in cloud transcription.
While Amazon Transcribe is still a significant player in the cloud transcription space, it faces increasing competition from companies like Otter.ai, which offers its own AI summarization features. There is also growing interest in speech recognition from major technology players like Meta, who are developing a translation model capable of recognizing nearly 100 languages.
OpenAI also launched its open-source transcription software, which is still very near the cutting edge of transcription performance and can be run locally on consumer hardware, called Whisper. The company introduced the software alongside an on-demand transcription service in September 2022.
Featured Image Credit: Elias Tigiser;