zsxkib/dia:91a8c206 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

zsxkib /dia:91a8c206

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
text	string		Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).
audio_prompt	string		Optional audio file (.wav/.mp3/.flac) for voice cloning. The model will attempt to mimic this voice style.
max_new_tokens	integer	3072 Min: 500 Max: 4096	Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).
cfg_scale	number	3 Min: 1 Max: 5	Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.
temperature	number	1.3 Min: 0.1 Max: 2	Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values (0.1-0.9) make output more consistent and predictable.
top_p	number	0.95 Min: 0.1 Max: 1	Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.
cfg_filter_top_k	integer	35 Min: 10 Max: 100	Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.
speed_factor	number	0.94 Min: 0.5 Max: 1.5	Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.
seed	integer		Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'format': 'uri', 'title': 'Output', 'type': 'string'}