lucataco/magnet

MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer

Public
2.6K runs

Input

string
Shift + Return to add a new line

Input Text

Default: "80s electronic track with melodic synthesizers, catchy beat and groovy bass"

string

Model to use

Default: "facebook/magnet-small-10secs"

integer
(minimum: 1, maximum: 4)

Number of variations to generate

Default: 3

string

An enumeration.

Default: "prod-stride1"

number

Temperature for sampling

Default: 3

number
(minimum: 0, maximum: 1)

Top p for sampling

Default: 0.9

number

Max CFG coefficient

Default: 10

number

Min CFG coefficient

Default: 1

integer

Number of decoding steps for stage 1

Default: 20

integer

Number of decoding steps for stage 2

Default: 10

integer

Number of decoding steps for stage 3

Default: 10

integer

Number of decoding steps for stage 4

Default: 10

Output

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Generated in

Run time and cost

This model costs approximately $0.0028 to run on Replicate, or 357 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 3 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Cog implementation of facebookresearch/MAGNeT

Model Details

Organization developing the model: The FAIR team of Meta AI.

Model date: MAGNeT was trained between November 2023 and January 2024.

Model version: This is the version 1 of the model.

Model type: MAGNeT consists of an EnCodec model for audio tokenization, and a non-autoregressive model based on the transformer architecture for music modeling. The model comes in different sizes: 300M and 1.5B; and two variants: a model trained for text-to-music generation, and a model trained for text-to-sound generation.

Paper or resources for more information: More information can be found in the paper Masked Audio Generation using a Single Non-Autoregressive Transformer.

Citation details: See our paper

License: Code is released under MIT, model weights are released under CC-BY-NC 4.0.

Where to send questions or comments about the model: Questions and comments about MAGNeT can be sent via the GitHub repository of the project, or by opening an issue.

Intended Use

Primary intended use: The primary use of MAGNeT is research on AI-based music generation, including:

  • Research efforts, such as probing and better understanding the limitations of generative models to further improve the state of science
  • Generation of music guided by text to understand current abilities of generative AI models by machine learning amateurs

Primary intended users: The primary intended users of the model are researchers in audio, machine learning and artificial intelligence, as well as amateur seeking to better understand those models.

Out-of-scope use cases: The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate music pieces that create hostile or alienating environments for people. This includes generating music that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

@misc{ziv2024masked,
      title={Masked Audio Generation using a Single Non-Autoregressive Transformer}, 
      author={Alon Ziv and Itai Gat and Gael Le Lan and Tal Remez and Felix Kreuk and Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
      year={2024},
      eprint={2401.04577},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}