zsxkib / step-video-t2v

Generate high-quality videos from text prompts using StepVideo

  • Public
  • 49 runs
  • GitHub
  • Weights
  • Paper
  • License

Step Video T2V - Text to Video Magic ✨

Transform your text descriptions into captivating videos with StepFun’s StepVideo model - now optimized to run on a single GPU! 🚀

About

This model turns your words into fluid, high-quality videos in seconds. Using StepFun’s groundbreaking approach to video generation, it creates remarkably coherent motion and impressive visuals from simple text prompts. 🎬

What makes this implementation special? 🌟

  • Single GPU power: Unlike the original implementation that required 4 GPUs, this version runs efficiently on just one H100! 💪
  • FP8 quantization: The diffusion model uses optimized FP8 precision for:
  • Faster generation on modern hardware 🏎️
  • Reduced memory footprint 🧠
  • Quicker creative iterations ⚡

While quantization introduces a slight quality trade-off compared to full-precision models, the speed and accessibility gains make this perfect for most creative projects!

Tips for stunning results 🎯

  • Be descriptive: “A golden retriever puppy playing with a red ball in a sunny park” works better than “dog playing”
  • Specify motion: Mention the action you want to see
  • Adjust frames: More frames = longer video, but might affect per-frame quality
  • Play with FPS: Higher FPS creates smoother motion
  • Use negative prompts: Add things you don’t want to see in the “negative prompt” field

Example prompts 💡

  • “A spaceship landing on a distant planet with two moons in the sky”
  • “Timelapse of a flower blooming in a lush garden”
  • “A robot chef preparing a gourmet meal in a futuristic kitchen”
  • “Waves crashing against a rocky shore during sunset”
  • “A panda doing kung fu moves in a bamboo forest”

Limitations 🚧

  • Text rendering isn’t perfect - avoid prompts that require specific text
  • Very complex scenes might lose some details
  • Faces can sometimes look a bit uncanny
  • The quantized version prioritizes speed over absolute quality

Coming soon… 📆

  • Multi-GPU support for even faster generation
  • Fine-tuned quality improvements while maintaining speed
  • Additional creative controls

Credits 🙏

Model adaptation and quantization by @zsakib_ - making high-end video generation accessible on single GPUs.

Based on StepFun’s StepVideo-T2V-Turbo with optimizations for Replicate’s infrastructure.

Happy video creating! 🎥✨