cuuupid / qwen2-vl-2b

SOTA open-source model for chatting with videos and the newest model in the Qwen family

  • Public
  • 576 runs
  • A100 (80GB)
  • GitHub
  • Paper
  • License
Iterate in playground
  • Prediction

    cuuupid/qwen2-vl-2b:d312c2be33a5ec3ab4787973aae74a485e194f4fccf106086b637afd6dd694b4
    ID
    f2des9xkpnrj00chmt1sf9mdgw
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    video
    width
    128
    height
    128
    prompt
    Describe the video in great detail.
    max_tokens
    512
    temperature
    0.7
    max_duration
    180

    Output

    [ "The video features a woman standing behind a podium, speaking to the camera. She is wearing a blue shirt and appears to be giving a presentation or lecture. The woman's facial expression suggests that she is engaged and passionate about the topic she is discussing. The background of the video is not visible, but it can be assumed that it is an indoor setting, possibly a conference room or lecture hall. The woman's speech is not audible, but her body language and gestures suggest that she is using hand movements to emphasize certain points. Overall, the video seems to be a formal presentation or lecture, with the woman as the main speaker." ]
    Generated in
  • Prediction

    cuuupid/qwen2-vl-2b:39932615ead975306d29c9ccb411a4ad1ec7df6c8adfafb68d565dccfa408883
    ID
    m1qcx0fywnrj40chmtgaj60bd4
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    video
    width
    128
    height
    128
    prompt
    what's the video about?
    max_tokens
    512
    temperature
    0.3
    max_duration
    360
    repetition_penalty
    1.1

    Output

    [ "The video is about principles of derivative classification and avoiding over-classification, as well as access to classified information." ]
    Generated in

Want to make some of these yourself?

Run this model