Official

wan-video / wan-2.1-1.3b

Generate 5s 480p videos. Wan is an advanced and powerful visual generation model developed by Tongyi Lab of Alibaba Group

  • Public
  • 2.1K runs
  • GitHub
  • Weights
  • Paper
  • License

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many videos are generated.

Check out our docs for more information about how per-video pricing works on Replicate.

Readme

Wan2.1

Wan2.1 is a suite of open video foundation models for video generation. The models support various tasks including Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio generation.

Key Features

  • High performance and quality video generation
  • Consumer-grade GPU compatibility (T2V-1.3B requires only 8.19GB VRAM)
  • Multiple task support
  • Visual text generation (supports both Chinese and English)
  • Efficient video VAE for encoding and decoding

Model Architecture

  • 3D Variational Autoencoder: Novel 3D causal VAE architecture (Wan-VAE) for improved video compression and generation
  • Video Diffusion DiT: Flow Matching framework with T5 Encoder for text encoding and transformer blocks with cross-attention

Model Specifications

Model Dimension Input Dimension Output Dimension Feedforward Dimension Frequency Dimension Number of Heads Number of Layers
1.3B 1536 16 16 8960 256 12 30
14B 5120 16 16 13824 256 40 40

Computational Efficiency

Performance varies by GPU. The 1.3B model is designed to run on consumer GPUs, while the 14B model benefits from multi-GPU setups.

License

The models are licensed under the Apache 2.0 License.