villesau / whisper-timestamped

Transcribes audio using Whisper Large V3 with precise word-level timestamps and confidence scores.

  • Public
  • 300 runs
  • GitHub
  • Weights
  • License

Whisper-Timestamped Transcription Model (Large V3)

Overview

This model provides speech recognition with word-level timestamps using the whisper-timestamped library and Whisper Large V3. It’s designed for transcribing audio files, offering precise timing information for each transcribed word.

Features

  • Uses Whisper Large V3 for state-of-the-art speech recognition
  • Efficient and accurate word-level timestamps
  • Voice Activity Detection (VAD) to improve transcription accuracy
  • Confidence scores for each word
  • Detection of speech disfluencies
  • Support for multiple languages
  • Options for transcription or translation to English

Usage

To use this model, provide an audio file. The model will process the audio and return a JSON object containing the transcription with detailed timing information for segments and individual words.

For detailed information on input parameters and output format, please refer to the model’s input/output specifications on this page.

About

This model is hosted on Replicate and uses the whisper-timestamped library with Whisper Large V3, an extension of OpenAI’s Whisper model. For more information about whisper-timestamped, visit the GitHub repository.