lucataco / voxtral-mini-3b

Voxtral builds upon Ministral-3B with powerful audio understanding capabilities

  • Public
  • 4 runs
  • GitHub
  • Weights
  • License
Iterate in playground

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Voxtral Mini 1.0 (3B) - 2507

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

Learn more about Voxtral in our blog post here.

Key Features

Voxtral builds upon Ministral-3B with powerful audio understanding capabilities. - Dedicated transcription mode: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly - Long-form context: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding - Built-in Q&A and summarization: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models - Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) - Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents - Highly capable at text: Retains the text understanding capabilities of its language model backbone, Ministral-3B

Benchmark Results

Audio

Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks:

image/png

Text

image/png

Usage

The model can be used with the following frameworks; - vllm (recommended): See here - Transformers 🤗: See here

Transcription

Voxtral-Mini-3B-2507 has powerful transcription capabilities!

Make sure that your client has mistral-common with audio installed:

Transformers 🤗

Voxtral is supported in Transformers natively!