You're looking at a specific version of this model. Jump to the model overview.
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
audio |
string
|
Audio file to process.
|
|
mode |
string
(enum)
|
transcription
Options: transcription, understanding |
Choose processing mode: 'transcription' converts speech to text, 'understanding' analyzes audio content using prompts.
|
prompt |
string
|
What can you tell me about this audio?
|
Question or instruction for understanding mode (e.g., 'What is the speaker discussing?', 'Summarize this audio'). Ignored in transcription mode.
|
language |
string
(enum)
|
Auto-detect
Options: Auto-detect, English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Arabic |
Audio language. 'Auto-detect' works for most content, or choose a specific language for better accuracy.
|
model_size |
string
(enum)
|
mini
Options: mini, small |
Model selection: 'mini' (3B) is faster and uses less GPU memory, 'small' (24B) provides higher accuracy for complex audio.
|
max_tokens |
integer
|
500
Min: 50 Max: 1000 |
Maximum response length. Higher values allow longer outputs but increase processing time.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'title': 'Output', 'type': 'string'}