You're looking at a specific version of this model. Jump to the model overview.

cjwbw /unival:a05f2bc5

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
input_image
string
Input image.
input_audio
string
Input audio.
input_video
string
Input video.
task_type
string (enum)
Image Captioning

Options:

Image Captioning, Video Captioning, Audio Captioning, Visual Grounding, General, General Video

Choose a task.
instruction
string
Provide question for the VQA task, region for Visual Grounding task, and instruction for General tasks. The default instruction for Captioning task is 'What does the image/video/audio describe?'

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'title': 'Output'}