(FM-NVIDIA) Fugatto: Foundational Generative Audio Transformer Opus 1 Podcast By  cover art

(FM-NVIDIA) Fugatto: Foundational Generative Audio Transformer Opus 1

(FM-NVIDIA) Fugatto: Foundational Generative Audio Transformer Opus 1

Listen for free

View show details

About this listen

Fugatto, a new generalist audio synthesis and transformation model developed by NVIDIA, and ComposableART, an inference-time technique designed to enhance its capabilities. Fugatto distinguishes itself by its ability to follow free-form text instructions, often with optional audio inputs, addressing the challenge that audio data, unlike text, typically lacks inherent instructional information. The document details a comprehensive data and instruction generation strategy that leverages large language models (LLMs) and audio understanding models to create diverse and rich datasets, enabling Fugatto to handle a wide array of tasks including text-to-speech, text-to-audio, and audio transformations. Furthermore, ComposableART allows for compositional abilities, such as combining, interpolating, or negating instructions, providing fine-grained control over audio outputs beyond the training distribution. The text presents experimental evaluations demonstrating Fugatto's competitive performance against specialised models and highlights its emergent capabilities, such as synthesising novel sounds or performing tasks not explicitly trained for.

link: https://d1qx31qr3h6wln.cloudfront.net/publications/FUGATTO.pdf

No reviews yet