AI model from OpenAI automatically recognizes speech and translates it to English

Benj Edwards / Ars Technica

On Wednesday, OpenAI launched a brand new open supply AI model known as Whisper that recognizes and translates audio at a stage that approaches human recognition potential. It can transcribe interviews, podcasts, conversations, and extra.

OpenAI educated Whisper on 680,000 hours of audio information and matching transcripts in 98 languages collected from the net. According to OpenAI, this open-collection strategy has led to “improved robustness to accents, background noise, and technical language.” It may detect the spoken language and translate it to English.

OpenAI describes Whisper as an encoder-decoder transformer, a sort of neural community that may use context gleaned from enter information to study associations that may then be translated into the model’s output. OpenAI presents this overview of Whisper’s operation:

Input audio is cut up into 30-second chunks, transformed right into a log-Mel spectrogram, and then handed into an encoder. A decoder is educated to predict the corresponding textual content caption, intermixed with particular tokens that direct the only model to carry out duties resembling language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

By open-sourcing Whisper, OpenAI hopes to introduce a brand new basis model that others can construct on sooner or later to enhance speech processing and accessibility instruments. OpenAI has a major observe report on this entrance. In January 2021, OpenAI launched CLIP, an open supply pc imaginative and prescient model that arguably ignited the latest period of quickly progressing picture synthesis expertise resembling DALL-E 2 and Stable Diffusion.


At Ars Technica, we examined Whisper from code out there on GitHub, and we fed it a number of samples, together with a podcast episode and a very difficult-to-understand part of audio taken from a phone interview. Although it took a while whereas working by way of a normal Intel desktop CPU (the expertise does not work in actual time but), Whisper did an excellent job of transcribing the audio into textual content by way of the demonstration Python program—much better than some AI-powered audio transcription providers we now have tried previously.

Example console output from the OpenAI's Whisper demonstration program as it transcribes a podcast.Enlarge / Example console output from the OpenAI’s Whisper demonstration program as it transcribes a podcast.

Benj Edwards / Ars Technica

With the correct setup, Whisper may simply be used to transcribe interviews, podcasts, and probably translate podcasts produced in non-English languages to English in your machine—without spending a dime. That’s a potent mixture that may ultimately disrupt the transcription business.

As with nearly each main new AI model lately, Whisper brings optimistic benefits and the potential for misuse. On Whisper’s model card (beneath the “Broader Implications” part), OpenAI warns that Whisper may very well be used to automate surveillance or determine particular person audio system in a dialog, however the firm hopes it shall be used “primarily for beneficial purposes.”


Please enter your comment!
Please enter your name here

Popular Posts

Together At Last: Titans Promises a Tighter Team and Darker Foes

The Titans have confronted interdimensional demons, assassins and a famously fearsome psychiatrist, however are they ready for what’s coming subsequent? HBO Max’s Titans returns...

Tweet Saying Nets ‘Formally Released Kyrie Irving’ Is Satire

Claim: The Brooklyn Nets launched Kyrie Irving from the NBA crew on Nov. 3, 2022. Rating: On Nov. 3,...

Data intelligence platform Alation bucks economic tendencies, raises $123M

Join us on November 9 to learn to efficiently innovate and obtain effectivity by upskilling and scaling citizen builders on the Low-Code/No-Code Summit. Register...

Medieval II Kingdoms expansion release date revealed

If you’ve been itching for extra Total War gameplay, we’ve received one thing for you. Feral Interactive has lastly revealed the Total War:...