
(FM-NVIDIA) Fugatto: Foundational Generative Audio Transformer Opus 1
Failed to add items
Add to Cart failed.
Add to Wish List failed.
Remove from wishlist failed.
Adding to library failed
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
By:
About this listen
Fugatto, a new generalist audio synthesis and transformation model developed by NVIDIA, and ComposableART, an inference-time technique designed to enhance its capabilities. Fugatto distinguishes itself by its ability to follow free-form text instructions, often with optional audio inputs, addressing the challenge that audio data, unlike text, typically lacks inherent instructional information. The document details a comprehensive data and instruction generation strategy that leverages large language models (LLMs) and audio understanding models to create diverse and rich datasets, enabling Fugatto to handle a wide array of tasks including text-to-speech, text-to-audio, and audio transformations. Furthermore, ComposableART allows for compositional abilities, such as combining, interpolating, or negating instructions, providing fine-grained control over audio outputs beyond the training distribution. The text presents experimental evaluations demonstrating Fugatto's competitive performance against specialised models and highlights its emergent capabilities, such as synthesising novel sounds or performing tasks not explicitly trained for.
link: https://d1qx31qr3h6wln.cloudfront.net/publications/FUGATTO.pdf