You're looking at a specific version of this model. Jump to the model overview.
zsxkib /realistic-voice-cloning:bbdb9b99
            
              
                
              
            
            Input schema
          
        The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description | 
|---|---|---|---|
| song_input | 
            string
            
           | 
            Upload your audio file here.
           | |
| rvc_model | 
            string
            
           | 
              Squidward
             | 
            RVC model for a specific voice. If using a custom model, this should match the 'custom_rvc_model_download_name'.
           | 
| custom_rvc_model_download_url | 
            string
            
           | 
            URL to download a custom RVC model. To use the downloaded model, 'rvc_model' should be set to the same value as 'custom_rvc_model_download_name'.
           | |
| custom_rvc_model_download_name | 
            string
            
           | 
            The name of the custom RVC model. This should match the 'rvc_model' if you want to use the downloaded model.
           | |
| pitch_change | 
            number
            
           | 
              0
             | 
            Change pitch of AI vocals in octaves. Set to 0 for no change. Generally, use 1 for male to female conversions and -1 for vice-versa.
           | 
| keep_files | 
            boolean
            
           | 
              False
             | 
            Can be added to keep all intermediate audio files generated. e.g. Isolated AI vocals/instrumentals. Leave out to save space.
           | 
| index_rate | 
            number
            
           | 
              0.5
             | 
            Control how much of the AI's accent to leave in the vocals. 0 <= INDEX_RATE <= 1.
           | 
| filter_radius | 
            integer
            
           | 
              3
             | 
            If >=3: apply median filtering median filtering to the harvested pitch results. 0 <= FILTER_RADIUS <= 7.
           | 
| rms_mix_rate | 
            number
            
           | 
              0.25
             | 
            Control how much to use the original vocal's loudness (0) or a fixed loudness (1). 0 <= RMS_MIX_RATE <= 1.
           | 
| pitch_detection_algorithm | 
            None
            
           | 
              rmvpe
             | 
            Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals).
           | 
| crepe_hop_length | 
            integer
            
           | 
              128
             | 
            When `pitch_detection_algo` is set to `mangio-crepe`, this controls how often it checks for pitch changes in milliseconds. Lower values lead to longer conversions and higher risk of voice cracks, but better pitch accuracy.
           | 
| protect | 
            number
            
           | 
              0.33
             | 
            Control how much of the original vocals' breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable. 0 <= PROTECT <= 0.5.
           | 
| main_vocals_volume_change | 
            number
            
           | 
              0
             | 
            Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels.
           | 
| backup_vocals_volume_change | 
            number
            
           | 
              0
             | 
            Control volume of backup AI vocals.
           | 
| instrumental_volume_change | 
            number
            
           | 
              0
             | 
            Control volume of the background music/instrumentals.
           | 
| pitch_change_all | 
            number
            
           | 
              0
             | 
            Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly.
           | 
| reverb_size | 
            number
            
           | 
              0.15
             Max: 1 | 
            The larger the room, the longer the reverb time. 0 <= REVERB_SIZE <= 1.
           | 
| reverb_wetness | 
            number
            
           | 
              0.2
             Max: 1 | 
            Level of AI vocals with reverb. 0 <= REVERB_WETNESS <= 1.
           | 
| reverb_dryness | 
            number
            
           | 
              0.8
             Max: 1 | 
            Level of AI vocals without reverb. 0 <= REVERB_DRYNESS <= 1.
           | 
| reverb_damping | 
            number
            
           | 
              0.7
             Max: 1 | 
            Absorption of high frequencies in the reverb. 0 <= REVERB_DAMPING <= 1.
           | 
| output_format | 
            None
            
           | 
              mp3
             | 
            wav for best quality and large file size, mp3 for decent quality and small file size.
           | 
            
              
                
              
            
            Output schema
          
        The shape of the response you’ll get when you run this model with an API.
              Schema
            
            {'format': 'uri', 'title': 'Output', 'type': 'string'}