cjwbw / voicecraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

  • Public
  • 10.4K runs
  • L40S
  • GitHub
  • Paper
  • License
  • Prediction

    cjwbw/voicecraft:6e42571a17e0fbbb0d92baa8d73c2926329cf8c3be8eedcee79822f7187b3080
    ID
    3wfg8mpzr5rgg0cf0azajq1x50
    Status
    Succeeded
    Source
    Web
    Hardware
    A40 (Large)
    Total duration
    Created

    Input

    task
    zero-shot text-to-speech
    top_p
    0.8
    kvcache
    1
    orig_audio
    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x
    cut_off_sec
    3.01
    left_margin
    0.08
    temperature
    1
    right_margin
    0.08
    whisperx_model
    base.en
    orig_transcript
     
    stop_repetition
    3
    voicecraft_model
    giga330M_TTSEnhanced.pth
    sample_batch_size
    4
    target_transcript
    I cannot believe that the same model can also do text to speech synthesis too!

    Output

    generated_audio

    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x

    whisper_transcript_orig_audio

    But when I had approached so near to them, the common object, which the sense deceives, lost not by distance any of its marks.
    Generated in
  • Prediction

    cjwbw/voicecraft:6e42571a17e0fbbb0d92baa8d73c2926329cf8c3be8eedcee79822f7187b3080
    ID
    rs2x9bedvsrgm0cf0b1ax3tedc
    Status
    Succeeded
    Source
    Web
    Hardware
    A40 (Large)
    Total duration
    Created

    Input

    task
    speech_editing-substitution
    top_p
    0.8
    kvcache
    1
    orig_audio
    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x
    cut_off_sec
    0
    left_margin
    0.08
    temperature
    1
    right_margin
    0.08
    whisperx_model
    base.en
    orig_transcript
     
    stop_repetition
    -1
    voicecraft_model
    giga330M.pth
    sample_batch_size
    1
    target_transcript
    But when I saw the mirage of the lake in the distance, which the sense deceives, Lost not by distance any of its marks

    Output

    generated_audio

    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x

    whisper_transcript_orig_audio

    But when I had approached so near to them, the common object, which the sense deceives, lost not by distance any of its marks.
    Generated in
  • Prediction

    cjwbw/voicecraft:6e42571a17e0fbbb0d92baa8d73c2926329cf8c3be8eedcee79822f7187b3080
    ID
    dgkjc72fp1rgp0cf0b1sscf154
    Status
    Succeeded
    Source
    Web
    Hardware
    A40 (Large)
    Total duration
    Created

    Input

    task
    speech_editing-insertion
    top_p
    0.8
    kvcache
    1
    orig_audio
    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x
    cut_off_sec
    0
    left_margin
    0.08
    temperature
    1
    right_margin
    0.08
    whisperx_model
    base.en
    orig_transcript
     
    stop_repetition
    -1
    voicecraft_model
    giga830M.pth
    sample_batch_size
    1
    target_transcript
    But when I had approached so near to them, the common object, which is so amazing and the sense deceives, lost not by distance any of its marks.

    Output

    generated_audio

    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x

    whisper_transcript_orig_audio

    But when I had approached so near to them, the common object, which the sense deceives, lost not by distance any of its marks.
    Generated in
  • Prediction

    cjwbw/voicecraft:6e42571a17e0fbbb0d92baa8d73c2926329cf8c3be8eedcee79822f7187b3080
    ID
    dhv3qpj8mnrgp0cf0b29py1kh8
    Status
    Succeeded
    Source
    Web
    Hardware
    A40 (Large)
    Total duration
    Created

    Input

    task
    speech_editing-deletion
    top_p
    0.8
    kvcache
    1
    orig_audio
    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x
    cut_off_sec
    0
    left_margin
    0.08
    temperature
    1
    right_margin
    0.08
    whisperx_model
    base.en
    orig_transcript
     
    stop_repetition
    -1
    voicecraft_model
    giga830M.pth
    sample_batch_size
    1
    target_transcript
    But when I had approached so near to them, the common object, lost not by distance any of its marks.

    Output

    generated_audio

    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x

    whisper_transcript_orig_audio

    But when I had approached so near to them, the common object, which the sense deceives, lost not by distance any of its marks.
    Generated in

Want to make some of these yourself?

Run this model