Refer to the GitHub repository for more information.

nateraw/video-llava

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Public
487.1K
runs