yorickvp / llava-13b

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

  • Public
  • 19.1M runs
  • GitHub
  • Paper
  • License

Create training

Trainings for this model run on Nvidia A100 (80GB) GPU hardware, which costs $0.0014 per second. Upon creation, you will be redirected to the training detail page where you can monitor your training's progress, and eventually download the weights and run the trained model.

Note: versions of this model with fast booting use the hardware set by the base model they were trained from.

If you haven’t yet trained a model on Replicate, we recommend you read one of the following guides.


You can finetune LLaVA with your own dataset, using LoRA techniques! Training data can be passed to cog train with the train_data parameter. Your training dataset should be a zip-file with the following structure:

  • ./images/: A folder with training data images.
  • ./data.json: A JSON file that links images to conversations. For details, see the dataset format instructions in the github repository.

Example code for training:

import replicate

training = replicate.trainings.create(
    version="yorickvp/llava-13b:[version_id]",
    input={
        "train_data": "https://my-domain/my-input-images.zip",
    },
    destination="my-name/my-model"
)
print(training)

You can find more information about finetuning image models in the Replicate docs. The tutorial on finetuning SDXL with your own images is a good starting point.