Meta unveils ‘Voicebox,’ a generative AI model for speech generation

Meta

Pic- IANS

San Francisco: Meta has introduced “Voicebox,” an advanced generative AI model designed to revolutionize speech generation tasks. Described as a versatile tool, Voicebox can handle speech-generation tasks it was not explicitly trained for, delivering state-of-the-art performance, Meta announced in a blog post.

Unlike traditional models, Voicebox creates high-quality audio clips instead of text or images. It supports speech synthesis across six languages—English, French, German, Spanish, Polish, and Portuguese—and offers features like noise removal, content editing, style conversion, and diverse speech sampling.

The model employs a novel approach, learning from raw audio and its accompanying transcription. Unlike autoregressive audio models, Voicebox can edit any part of a sample, not just the end of an audio clip. Meta explained that the model is trained to predict a speech segment using surrounding speech and the transcript of the segment.

This ability to “infill” speech from context allows Voicebox to tackle various tasks, such as generating parts of an audio recording without recreating the entire file. Its versatility spans multiple applications, including in-context text-to-speech synthesis, cross-lingual style transfer, speech denoising and editing, and generating diverse speech samples.

With Voicebox, Meta aims to set a new benchmark in the field of AI-powered speech generation.

PNN

Exit mobile version