sakemin / musicongen

"MusiConGen: Rhythm and chord control for Transformer-based text-to-music generation"

  • Public
  • 65 runs
  • GitHub
  • Paper
  • License



Run time and cost

This model runs on Nvidia A40 GPU hardware. We don't yet have enough runs of this model to provide performance information.



Yun-Han Lan, Wen-Yi Hsiao, Hao-Chung Cheng and Yi-Hsuan Yang “MusiConGen: Rhythm and chord control for Transformer-based text-to-music generation” In Proc. Int. Society for Music Information Retrieval Conf. (ISMIR), 2024.

MusiConGen is based on pretrained MusicGen with additional controls: Rhythm and Chords. The project contains inference, training code and training data (youtube list).
Arxiv Paper | Demo

Text Based Chord Conditioning

Text Chord Condition Format

  • SPACE is used as split token. Each splitted chunk is assigned to a single bar.
    • C G E:min A:min
  • When multiple chords must be assigned in a single bar, then append more chords with ,.
    • C G,G:7 E:min,E:min7 A:min
  • Chord type can be specified after :.
    • Just using a single uppercase alphabet(eg. C, E) is considered as a major chord.
    • maj, min, dim, aug, min6, maj6, min7, minmaj7, maj7, 7, dim7, hdim7, sus2 and sus4 can be appended with :.
      • eg. E:dim, B:sus2
  • ‘sharp’ and ‘flat’ can be specified with # and b.
    • eg. E#:min Db

BPM and Time Signature

  • To create chord chroma, bpm and time_sig values must be specified.
    • bpm can be a float value. (eg. 132, 60)
    • The format of time_sig is (int)/(int). (eg. 4/4, 3/4, 6/8, 7/8, 5/4)
  • bpm and time_sig values will be automatically concatenated after prompt description value, so you don’t need to specify bpm or time signature information in the description for prompt.



      title={MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation}, 
      author={Yun-Han Lan and Wen-Yi Hsiao and Hao-Chung Cheng and Yi-Hsuan Yang},