Distil-Whisper

Distil-Whisper is a distilled version of Whisper that is 6 times faster, 49% smaller, and performs within 1% word error rate (WER) on out-of-distribution evaluation sets.

Model	Params / M	Rel. Latency	Short-Form WER	Long-Form WER
whisper-large-v2	1550	1.0	9.1	11.7

distil-large-v2	756	5.8	10.1	11.6
distil-medium.en	394	6.8	11.1	12.4

Acknowledgements

OpenAI for the Whisper model and original codebase
Hugging Face 🤗 Transformers for the model integration
Google’s TPU Research Cloud (TRC) programme for Cloud TPU v4s

Citation

If you use this model, please consider citing the Distil-Whisper paper:

@misc{gandhi2023distilwhisper,
      title={Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling}, 
      author={Sanchit Gandhi and Patrick von Platen and Alexander M. Rush},
      year={2023},
      eprint={2311.00430},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

And also the Whisper paper:

@misc{radford2022robust,
      title={Robust Speech Recognition via Large-Scale Weak Supervision}, 
      author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
      year={2022},
      eprint={2212.04356},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}