Categories: AIArtificial IntelligenceInnovationResearch

Nvidia AI Model Generates, Manipulates Sounds And Voices

A cat wearing headphones and using AI to generate sounds and music. Image credit: Nvidia

Nvidia said it has developed a new artificial intelligence (AI) model that can create sounds or music from language prompts and change the way a voice sounds, combining multiple capabilities in a way that it said allows it to follow free-form instructions.

The company, which is the biggest provider of chips and software for developing AI models, said the model was a research project and that it has no plans for releasing it to the public.

Nevertheless, the technology has implications for fields including music, entertainment and translation services.

The Fugatto, or Foundational Generative Audio Transformer Opus 1 model, can generate audio from text prompts as well as modifying audio uploaded by the user.

Nvidia chief executive Jensen Huang. Image credit: Nvidia

‘Sound machine’

For instance, audio of a voice speaking words could be translated into another language while still sounding like the same person was speaking.

A simple uploaded tune could be made to sound like an orchestral performance, or made the basis of a more complicated musical composition with added drums and other instruments.

An uploaded text could be read by a voice of the user’s choice, which could be modified to use different accents or emotions.

The model can generate new types of sounds, such as a trumpet that barks like a dog, according to the tech firm.

The technology is similar to those from start-ups such as Runway or large tech firms such as Meta that generate audio or video from text prompts, but is distinguished in combining various capabilities such as sound generation and sound manipulation into a single model, Nvidia said, calling it the world’s “most flexible sound machine”.

“We wanted to create a model that understands and generates sound like humans do,” said Rafael Valle, a manager of applied audio research at Nvidia who worked on Fugatto.

Industry tensions

The company said it hopes such technology will be a new tool for artists in the way that electric guitars created rock and roll.

To date, AI has had a rocky reception from entertainers, who fear companies will use it to clone their voices or likenesses or put writers out of work.

The Writers Guild of America and SAG-AFTRA last year carried out strikes over such issues, finally reaching deals with studios to place limits on the technology.

In May Scarlett Johansson accused OpenAI of adding a voice to its ChatGPT chatbot that was “eerily similar” to hers after she declined to allow her voice to be used.

OpenAI said the similarity was coincidental but quickly removed the voice

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

NextAustralian Parliament Passes Bill To Ban Social Media For Under-16s »

Previous « Australian Senate Grills Lobbyist Over Social Media Failures

Nvidia AI Model Generates, Manipulates Sounds And Voices

‘Sound machine’

Industry tensions

Recent Posts

Elon Musk’s xAI Buys Social Media Platform X

TikTok Shop Expands In Europe Amidst US Uncertainty

Overcoming Digital Transformation Challenges: Lessons from Industry Leaders

Microsoft Drops AI Data Centre Projects

SMIC Sees Record Revenue, But Halved Profits

Google Brings Android Development In-House In Major Shift

Nvidia AI Model Generates, Manipulates Sounds And Voices

‘Sound machine’

Industry tensions

Related Post

Recent Posts

Elon Musk’s xAI Buys Social Media Platform X

TikTok Shop Expands In Europe Amidst US Uncertainty

Overcoming Digital Transformation Challenges: Lessons from Industry Leaders

Microsoft Drops AI Data Centre Projects

SMIC Sees Record Revenue, But Halved Profits

Google Brings Android Development In-House In Major Shift