Official

wan-video / wan-2.1-1.3b

Generate 5s 480p videos. Wan is an advanced and powerful visual generation model developed by Tongyi Lab of Alibaba Group

  • Public
  • 2.4K runs
  • GitHub
  • Weights
  • Paper
  • License

Wan2.1

Wan2.1 is a suite of open video foundation models for video generation. The models support various tasks including Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio generation.

Key Features

  • High performance and quality video generation
  • Consumer-grade GPU compatibility (T2V-1.3B requires only 8.19GB VRAM)
  • Multiple task support
  • Visual text generation (supports both Chinese and English)
  • Efficient video VAE for encoding and decoding

Model Architecture

  • 3D Variational Autoencoder: Novel 3D causal VAE architecture (Wan-VAE) for improved video compression and generation
  • Video Diffusion DiT: Flow Matching framework with T5 Encoder for text encoding and transformer blocks with cross-attention

Model Specifications

Model Dimension Input Dimension Output Dimension Feedforward Dimension Frequency Dimension Number of Heads Number of Layers
1.3B 1536 16 16 8960 256 12 30
14B 5120 16 16 13824 256 40 40

Computational Efficiency

Performance varies by GPU. The 1.3B model is designed to run on consumer GPUs, while the 14B model benefits from multi-GPU setups.

License

The models are licensed under the Apache 2.0 License.