Official

wavespeedai / wan-2.1-t2v-720p

Accelerated inference for Wan 2.1 14B with high resolution, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation.

  • Public
  • 286 runs
  • GitHub
  • Weights
  • License

Accelerated Inference for Wan 2.1 14B with High Resolution

We are WaveSpeedAI, providing highly-optimized inference optimization for generative AI models.

We are excited to introduce our new product, a highly-optimized inference endpoint for Wan-2.1 14B model, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation.

We utilize cutting-edge inference acceleration techniques to provide very fast inference for this model. And we are happy to bring this to you together with Replicate and DataCrunch.

Model Description ✨

Wan: Open and Advanced Large-Scale Video Generative Models

In this repository, we present Wan2.1, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. Wan2.1 offers these key features: - 👍 SOTA Performance: Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks. - 👍 Supports Consumer-grade GPUs: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models. - 👍 Multiple Tasks: Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation. - 👍 Visual Text Generation: Wan2.1 is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications. - 👍 Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.