You're looking at a specific version of this model. Jump to the model overview.
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
audio_file |
string
|
Audio file to transcribe
|
|
language |
string
|
Language code (e.g., 'en', 'fr', 'de')
|
|
model |
string
(enum)
|
large-v3-turbo
Options: tiny, tiny-en, base, base-en, small, small-en, distil-small-en, medium, medium-en, distil-medium-en, large, large-v1, large-v2, distil-large-v2, large-v3, distil-large-v3, large-v3-turbo |
Whisper model to use
|
subtitle |
boolean
|
False
|
Generate subtitles (.srt, .vtt)
|
sub_length |
integer
|
5
Min: 1 |
Subtitle segment length in words
|
translate |
boolean
|
False
|
Translate to English
|
annotate |
boolean
|
False
|
Enable speaker annotation (requires HF token)
|
num_speakers |
integer
|
Min: 2 |
Number of speakers to annotate (auto-detection if None)
|
hf_token |
string
|
HuggingFace Access token for speaker annotation
|
|
verbose |
boolean
|
False
|
Print text chunks during transcription
|
post_correction |
string
|
Path to YAML file for post-correction
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}