Whisper-Timestamped Transcription Model (Large V3)

Overview

This model provides speech recognition with word-level timestamps using the whisper-timestamped library and Whisper Large V3. It’s designed for transcribing audio files, offering precise timing information for each transcribed word.

Features

Uses Whisper Large V3 for state-of-the-art speech recognition
Efficient and accurate word-level timestamps
Voice Activity Detection (VAD) to improve transcription accuracy
Confidence scores for each word
Detection of speech disfluencies
Support for multiple languages
Options for transcription or translation to English

Usage

To use this model, provide an audio file. The model will process the audio and return a JSON object containing the transcription with detailed timing information for segments and individual words.

For detailed information on input parameters and output format, please refer to the model’s input/output specifications on this page.

About

This model is hosted on Replicate and uses the whisper-timestamped library with Whisper Large V3, an extension of OpenAI’s Whisper model. For more information about whisper-timestamped, visit the GitHub repository.