pphu / musicgen-small

Generate music from a prompt or melody

  • Public
  • 150 runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.053 to run on Replicate, or 18 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Forked from https://replicate.com/facebookresearch/musicgen which only supported musicgen-large and Melody model at the time of forking.

Model Architecture and Development MusicGen is single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn’t require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. They used 20K hours of licensed music to train MusicGen. Specifically, they relied on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.

Licenses All code in this repository is licensed under the Apache License 2.0 license. The code in the Audiocraft repository is released under the MIT license as found in the LICENSE file. The weights in the Audiocraft repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.