You're looking at a specific version of this model. Jump to the model overview.

adirik /hierspeechpp:ff5bcc71

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
input_text
string
Text input to the model. If provided, it will be used for the speech content of the output.
input_sound
string
Sound input to the model in .wav format. If provided, it will be used for the speech content of the output.
target_voice
string
A voice clip in .wav format containing the speaker to synthesize.
denoise_ratio
number
0

Max: 1

Noise control. 0 means no noise reduction, 1 means maximum noise reduction. If noise reduction is desired, it is recommended to set this value to 0.6~0.8
text_to_vector_temperature
number
0.33

Max: 1

Temperature for text-to-vector model. Larger value corresponds to slightly more random output.
voice_conversion_temperature
number
0.33

Max: 1

Temperature for the voice conversion model. Larger value corresponds to slightly more random output.
output_sample_rate
integer (enum)
16000

Options:

16000, 24000, 48000

Sample rate of the output audio file.
scale_output_volume
boolean
False
Scale normalization. If set to true, the output audio will be scaled according to the input sound if provided.
seed
integer
Random seed to use for reproducibility.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}