jigsawstack / tts

Transform text into natural-sounding human-like AI voices with low latency and exceptional quality.

  • Public
  • 11 runs
Iterate in playground

Run time and cost

This model runs on CPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

πŸŽ™οΈ JigsawStack Speech-to-Text (STT)

This model wraps the JigsawStack Speech-to-Text API and leverages the powerful Whisper V3 model to transcribe and optionally translate audio/video files.

It supports long files, speaker diarization, and webhook delivery for async processing β€” ideal for meetings, podcasts, interviews, or multilingual content.


🧠 What It Does

You provide a video or audio file (via URL or file_store_key), and the model returns the full transcript. It can optionally: - Auto-detect language - Translate to English or any supported language - Separate different speakers (speaker diarization)


πŸ”‘ Inputs

Name Type Required Description
url string ❌ No Public URL to the media file (audio/video)
file_store_key string ❌ No Key to a file stored in JigsawStack’s file storage
language string ❌ No Language code to force transcription language (auto-detect if omitted)
translate bool ❌ No If true, translates transcript into English (or specified language)
by_speaker bool ❌ No Enables speaker diarization to separate different speakers
webhook_url string ❌ No A webhook URL for async delivery of results
batch_size number ❌ No Controls audio chunking during processing (default: 30, max: 40)
api_key string βœ… Yes Your JigsawStack API key

πŸ”Έ You must provide either url or file_store_key. Not both.