zsxkib / create-video-dataset

Easily create video datasets with auto-captioning for Hunyuan-Video LoRA finetuning

  • Public
  • 73 runs
  • GitHub
  • License

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Create Video Dataset

A tool to easily prepare video datasets with automatic captioning for AI training. This tool processes videos (from URLs or local files), generates high-quality captions using QWEN-VL, and packages everything into a training-ready format.

Features

  • 🎥 Process YouTube URLs or local video files
  • 🤖 Automatic video captioning using QWEN-VL
  • ✍️ Support for custom captions
  • 🏷️ Configurable trigger words for training
  • 📝 Prefix/suffix support for caption formatting
  • 🗃️ Clean output in zip format

Input Parameters

Parameter Description Default
video_url YouTube/video URL to process None
video_file Local video file to process None
trigger_word Training trigger word (e.g., TOK, STYLE3D) “TOK”
autocaption Use AI to generate captions True
custom_caption Your custom caption (required if autocaption=False) None
autocaption_prefix Text to add before captions None
autocaption_suffix Text to add after captions None

Output

The tool produces a zip file containing: - Processed video file - Caption files (.txt) for each video - Proper directory structure for training