Run time and cost

This model runs on CPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

🎙️ JigsawStack Speech-to-Text (STT)

This model wraps the JigsawStack Speech-to-Text API and leverages the powerful Whisper V3 model to transcribe and optionally translate audio/video files.

It supports long files, speaker diarization, and webhook delivery for async processing — ideal for meetings, podcasts, interviews, or multilingual content.

🧠 What It Does

You provide a video or audio file (via URL or file_store_key), and the model returns the full transcript. It can optionally: - Auto-detect language - Translate to English or any supported language - Separate different speakers (speaker diarization)

🔑 Inputs

Name	Type	Required	Description
`url`	string	❌ No	Public URL to the media file (audio/video)
`file_store_key`	string	❌ No	Key to a file stored in JigsawStack’s file storage
`language`	string	❌ No	Language code to force transcription language (auto-detect if omitted)
`translate`	bool	❌ No	If `true`, translates transcript into English (or specified `language`)
`by_speaker`	bool	❌ No	Enables speaker diarization to separate different speakers
`webhook_url`	string	❌ No	A webhook URL for async delivery of results
`batch_size`	number	❌ No	Controls audio chunking during processing (default: `30`, max: `40`)
`api_key`	string	✅ Yes	Your JigsawStack API key

🔸 You must provide either url or file_store_key. Not both.