lucataco / qwen2.5-omni-7b

Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

  • Public
  • 570 runs
  • GitHub
  • Weights
  • Paper
  • License

Want to make some of these yourself?

Run this model