Google speech to text online demo

12/2/2023

Speech recognition algorithms, too, often contain biases. For example, Google Translate once presupposed that doctors were male while nurses were female in certain languages, while Bing’s translator translated phrases like “the table is soft” as the feminine “die Tabelle” in German, which refers to a table of figures. “With state-of-the-art results, we believe SeamlessM4T is an important breakthrough in the AI community’s quest toward creating universal multitask systems,” Meta wrote in the blog post.īut one wonders what biases the model might contain.Ī recent piece in The Conversation points out the many flaws in AI-powered translation, including different forms of gender bias.

It attributes this to the rich combination of speech and text data in the training dataset, which Meta believes gives SeamlessM4T a leg up over speech-only and text-only models. Meta claims that on an internal benchmark, SeamlessM4T performed better against background noises and “speaker variations” in speech-to-text tasks compared to the current state-of-the-art speech transcription model. Researchers aligned 443,000 hours of speech with texts and created 29,000 hours of “speech-to-speech” alignments, which “taught” SeamlessM4T how to transcribe speech to text, translate text, generate speech from text and even translate words spoken in one language into words in another language. Whatever the case, Meta used the scraped text and speech to create the training dataset for SeamlessM4T, called SeamlessAlign. Some have filed lawsuits against companies building AI tools on top of publicly available data, arguing that the vendors should be compelled to provide credit if not compensation - and clear ways to opt out.īut Meta claims that the data it mined - which might contain personally identifiable information, the company admits - wasn’t copyrighted and came primarily from open source or licensed sources. Not every content creator agrees with the practice of leveraging public data to train models that could be used commercially. In an interview with TechCrunch, Juan Pino, a research scientist at Meta’s AI research division and a contributor on the project, wouldn’t reveal the exact sources of the data, saying only that there was “a variety” of them. In developing it, Meta says that it scraped publicly available text (in the order of “tens of billions” of sentences) and speech (4 million hours) from the web. Mozilla, meanwhile, spearheaded Common Voice, one of the largest multi-language collections of voices for training automatic speech recognition algorithms.īut SeamlessM4T is among the more ambitious efforts to date to combine translation and transcription capabilities into a single model. Meta isn’t the only one investing resources in developing sophisticated AI translation and transcription tools.īeyond the wealth of commercial services and open source models already available from Amazon, Microsoft, OpenAI and a number of startups, Google is creating what it calls the Universal Speech Model, a part of the tech giant’s larger effort to build a model that can understand the world’s 1,000 most-spoken languages. And it builds on Massively Multilingual Speech, Meta’s framework that provides speech recognition, language identification and speech synthesis tech across more than 1,100 languages. SeamlessM4T is something of a spiritual successor to Meta’s No Language Left Behind, a text-to-text machine translation model, and Universal Speech Translator, one of the few direct speech-to-speech translation systems to support the Hokkien language. “SeamlessM4T implicitly recognizes the source languages without the need for a separate language identification model.” “Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” Meta writes in a blog post shared with TechCrunch. In its quest to develop AI that can understand a range of different dialects, Meta has created an AI model, SeamlessM4T, that can translate and transcribe close to 100 languages across text and speech.Īvailable in open source along with SeamlessAlign, a new translation dataset, Meta claims that SeamlessM4T represents a “significant breakthrough” in the field of AI-powered speech-to-speech and speech-to-text.

0 Comments

Google speech to text online demo

Leave a Reply.

Author

Archives

Categories