ποΈ JigsawStack Speech-to-Text (STT)
This model wraps the JigsawStack Speech-to-Text API and leverages the powerful Whisper V3 model to transcribe and optionally translate audio/video files.
It supports long files, speaker diarization, and webhook delivery for async processing β ideal for meetings, podcasts, interviews, or multilingual content.
π§ What It Does
You provide a video or audio file (via URL or file_store_key
), and the model returns the full transcript. It can optionally:
- Auto-detect language
- Translate to English or any supported language
- Separate different speakers (speaker diarization)
π Inputs
Name | Type | Required | Description |
---|---|---|---|
url |
string | β No | Public URL to the media file (audio/video) |
file_store_key |
string | β No | Key to a file stored in JigsawStackβs file storage |
language |
string | β No | Language code to force transcription language (auto-detect if omitted) |
translate |
bool | β No | If true , translates transcript into English (or specified language ) |
by_speaker |
bool | β No | Enables speaker diarization to separate different speakers |
webhook_url |
string | β No | A webhook URL for async delivery of results |
batch_size |
number | β No | Controls audio chunking during processing (default: 30 , max: 40 ) |
api_key |
string | β Yes | Your JigsawStack API key |
πΈ You must provide either
url
orfile_store_key
. Not both.