HTTP API reference

The model’s input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the “API” tab on the model you are running or get the model version and look at its openapi_schema property. For example, stability-ai/sdxl takes prompt as an input.

Files should be passed as HTTP URLs or data URLs.

Use an HTTP URL when:

you have a large file > 256kb
you want to be able to use the file multiple times
you want your prediction metadata to be associable with your input files

Use a data URL when:

you have a small file <= 256kb
you don’t want to upload and host the file somewhere
you don’t need to use the file again (Replicate will not store it)

version (required)

The ID of the model version that you want to run.

stream

Request a URL to receive streaming output using server-sent events (SSE).

If the requested model version supports streaming, the returned prediction will have a stream entry in its urls property with an HTTPS URL that you can use to construct an EventSource.

An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.

prediction_id (required)

By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter in the prediction request:

start: immediately on prediction start
output: each time a prediction generates an output (note that predictions can generate multiple outputs)
logs: each time log output is generated by a prediction
completed: when the prediction reaches a terminal state (succeeded/canceled/failed)

For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:

{
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "input": {
    "text": "Alice"
  },
  "webhook": "https://example.com/my-webhook",
  "webhook_events_filter": ["start", "completed"]
}

Requests for event types output and logs will be sent at most once every 500ms. If you request start and completed webhooks, then they’ll always be sent regardless of throttling.

Get a prediction

GET https://api.replicate.com/v1/predictions/{prediction_id}

Get the current state of a prediction.

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu

The response will be the prediction object:

{
  "id": "gm3qorzdhgbfurvjtvhg6dckhu",
  "model": "replicate/hello-world",
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "input": {
    "text": "Alice"
  },
  "logs": "",
  "output": "hello Alice",
  "error": null,
  "status": "succeeded",
  "created_at": "2023-09-08T16:19:34.765994Z",
  "data_removed": false,
  "started_at": "2023-09-08T16:19:34.779176Z",
  "completed_at": "2023-09-08T16:19:34.791859Z",
  "metrics": {
    "predict_time": 0.012683
  },
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel",
    "get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu"
  }
}

status will be one of:

starting: the prediction is starting up. If this status lasts longer than a few seconds, then it’s typically because a new worker is being started to run the prediction.
processing: the predict() method of the model is currently running.
succeeded: the prediction completed successfully.
failed: the prediction encountered an error during processing.
canceled: the prediction was canceled by its creator.

In the case of success, output will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You’ll need to pass the Authorization header to request them.

In the case of failure, error will contain the error encountered during the prediction.

Terminated predictions (with a status of succeeded, failed, or canceled) will include a metrics object with a predict_time property showing the amount of CPU or GPU time, in seconds, that the prediction used while running. It won’t include time waiting for the prediction to start.

All input parameters, output values, and logs are automatically removed after an hour, by default, for predictions created through the API.

You must save a copy of any data or files in the output if you’d like to continue using them. The output key will still be present, but it’s value will be null after the output has been removed.

Output files are served by replicate.delivery and its subdomains. If you use an allow list of external domains for your assets, add replicate.delivery and *.replicate.delivery to it.

Request path parameters

The ID of the prediction to get.

List predictions

GET https://api.replicate.com/v1/predictions

Get a paginated list of predictions that you’ve created. This will include predictions created from the API and the website. It will return 100 records per page.

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/predictions

The response will be a paginated JSON array of prediction objects, sorted with the most recent prediction first:

{
  "next": null,
  "previous": null,
  "results": [
    {
      "completed_at": "2023-09-08T16:19:34.791859Z",
      "created_at": "2023-09-08T16:19:34.907244Z",
      "data_removed": false,
      "error": null,
      "id": "gm3qorzdhgbfurvjtvhg6dckhu",
      "input": {
        "text": "Alice"
      },
      "metrics": {
        "predict_time": 0.012683
      },
      "output": "hello Alice",
      "started_at": "2023-09-08T16:19:34.779176Z",
      "source": "api",
      "status": "succeeded",
      "urls": {
        "get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu",
        "cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel"
      },
      "model": "replicate/hello-world",
      "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
    }
  ]
}

id will be the unique ID of the prediction.

source will indicate how the prediction was created. Possible values are web or api.

status will be the status of the prediction. Refer to get a single prediction for possible values.

urls will be a convenience object that can be used to construct new API requests for the given prediction.

model will be the model identifier string in the format of {model_owner}/{model_name}.

version will be the unique ID of model version used to create the prediction.

data_removed will be true if the input and output data has been deleted.

Cancel a prediction

POST https://api.replicate.com/v1/predictions/{prediction_id}/cancel

Request path parameters

prediction_id (required)

The ID of the prediction to cancel.

Create a model

POST https://api.replicate.com/v1/models

Create a model.

Example cURL request:

curl -s -X POST \
  -H "Authorization: Bearer <paste-your-token-here>" \
  -H 'Content-Type: application/json' \
  -d '{"owner": "alice", "name": "my-model", "description": "An example model", "visibility": "public", "hardware": "cpu"}' \
  https://api.replicate.com/v1/models

The response will be a model object in the following format:

{
  "url": "https://replicate.com/alice/my-model",
  "owner": "alice",
  "name": "my-model",
  "description": "An example model",
  "visibility": "public",
  "github_url": null,
  "paper_url": null,
  "license_url": null,
  "run_count": 0,
  "cover_image_url": null,
  "default_example": null,
  "latest_version": null,
}

Note that there is a limit of 1,000 models per account. For most purposes, we recommend using a single model and pushing new versions of the model as you make changes to it.

Request body

hardware (required)

The SKU for the hardware used to run the model. Possible values can be retrieved from the hardware.list endpoint.

name (required)

The name of the model. This must be unique among all models owned by the user or organization.

owner (required)

The name of the user or organization that will own the model. This must be the same as the user or organization that is making the API request. In other words, the API token used in the request must belong to this user or organization.

visibility (required)

Whether the model should be public or private. A public model can be viewed and run by anyone, whereas a private model can be viewed and run only by the user or organization members that own the model.

cover_image_url

A URL for the model’s cover image. This should be an image file.

description

A description of the model.

github_url

A URL for the model’s source code on GitHub.

license_url

A URL for the model’s license.

paper_url

A URL for the model’s paper.

Get a model

GET https://api.replicate.com/v1/models/{model_owner}/{model_name}

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/models/replicate/hello-world

The response will be a model object in the following format:

{
  "url": "https://replicate.com/replicate/hello-world",
  "owner": "replicate",
  "name": "hello-world",
  "description": "A tiny model that says hello",
  "visibility": "public",
  "github_url": "https://github.com/replicate/cog-examples",
  "paper_url": null,
  "license_url": null,
  "run_count": 5681081,
  "cover_image_url": "...",
  "default_example": {...},
  "latest_version": {...},
}

The cover_image_url string is an HTTPS URL for an image file. This can be:

An image uploaded by the model author.
The output file of the example prediction, if the model author has not set a cover image.
The input file of the example prediction, if the model author has not set a cover image and the example prediction has no output file.
A generic fallback image.

The default_example object is a prediction created with this model.

The latest_version object is the model’s most recently pushed version.

Request path parameters

The name of the user or organization that owns the model.

The name of the model.

List public models

GET https://api.replicate.com/v1/models

Get a paginated list of public models.

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/models

The response will be a paginated JSON array of model objects:

{
  "next": null,
  "previous": null,
  "results": [
    {
      "url": "https://replicate.com/acme/hello-world",
      "owner": "acme",
      "name": "hello-world",
      "description": "A tiny model that says hello",
      "visibility": "public",
      "github_url": "https://github.com/replicate/cog-examples",
      "paper_url": null,
      "license_url": null,
      "run_count": 5681081,
      "cover_image_url": "...",
      "default_example": {...},
      "latest_version": {...}
    }
  ]
}

The cover_image_url string is an HTTPS URL for an image file. This can be:

An image uploaded by the model author.
The output file of the example prediction, if the model author has not set a cover image.
The input file of the example prediction, if the model author has not set a cover image and the example prediction has no output file.
A generic fallback image.

Delete a model

DELETE https://api.replicate.com/v1/models/{model_owner}/{model_name}

Delete a model

Model deletion has some restrictions:

You can only delete models you own.
You can only delete private models.
You can only delete models that have no versions associated with them. Currently you’ll need to delete the model’s versions before you can delete the model itself.

Example cURL request:

curl -s -X DELETE \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/models/replicate/hello-world

The response will be an empty 204, indicating the model has been deleted.

Request path parameters

The name of the user or organization that owns the model.

The name of the model.

Get a model version

GET https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions/{version_id}

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/models/replicate/hello-world/versions/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa

The response will be the version object:

{
  "id": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "created_at": "2022-04-26T19:29:04.418669Z",
  "cog_version": "0.3.0",
  "openapi_schema": {...}
}

Every model describes its inputs and outputs with OpenAPI Schema Objects in the openapi_schema property.

The openapi_schema.components.schemas.Input property for the replicate/hello-world model looks like this:

{
  "type": "object",
  "title": "Input",
  "required": [
    "text"
  ],
  "properties": {
    "text": {
      "x-order": 0,
      "type": "string",
      "title": "Text",
      "description": "Text to prefix with 'hello '"
    }
  }
}

The openapi_schema.components.schemas.Output property for the replicate/hello-world model looks like this:

{
  "type": "string",
  "title": "Output"
}

For more details, see the docs on Cog’s supported input and output types

Request path parameters

The name of the user or organization that owns the model.

The name of the model.

version_id (required)

The ID of the version.

List model versions

GET https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/models/replicate/hello-world/versions

The response will be a JSON array of model version objects, sorted with the most recent version first:

{
  "next": null,
  "previous": null,
  "results": [
    {
      "id": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
      "created_at": "2022-04-26T19:29:04.418669Z",
      "cog_version": "0.3.0",
      "openapi_schema": {...}
    }
  ]
}

Request path parameters

The name of the user or organization that owns the model.

The name of the model.

Delete a model version

DELETE https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions/{version_id}

Delete a model version and all associated predictions, including all output files.

Model version deletion has some restrictions:

You can only delete versions from models you own.
You can only delete versions from private models.
You cannot delete a version if someone other than you has run predictions with it.
You cannot delete a version if it is being used as the base model for a fine tune/training.
You cannot delete a version if it has an associated deployment.
You cannot delete a version if another model version is overridden to use it.

Example cURL request:

curl -s -X DELETE \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/models/replicate/hello-world/versions/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa

The response will be an empty 202, indicating the deletion request has been accepted. It might take a few minutes to be processed.

Request path parameters

The name of the user or organization that owns the model.

The name of the model.

version_id (required)

The ID of the version.

Create a prediction using an official model

POST https://api.replicate.com/v1/models/{model_owner}/{model_name}/predictions

Start a new prediction for an official model using the inputs you provide.

Example request body:

{
  "input": {
    "prompt": "Write a short poem about the weather."
  }
}

Example cURL request:

curl -s -X POST \
  -d '{"input": {"prompt": "Write a short poem about the weather."}}' \
  -H "Authorization: Bearer <paste-your-token-here>" \
  -H 'Content-Type: application/json' \
  https://api.replicate.com/v1/models/meta/meta-llama-3-70b-instruct/predictions

The response will be the prediction object:

{
  "id": "25s2s4n7rdrgg0cf1httb3myk0",
  "model": "replicate-internal/llama3-70b-chat-vllm-unquantized",
  "version": "dp-cf04fe09351e25db628e8b6181276547",
  "input": {
    "prompt": "Write a short poem about the weather."
  },
  "logs": "",
  "error": null,
  "status": "starting",
  "created_at": "2024-04-23T19:36:28.355Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/25s2s4n7rdrgg0cf1httb3myk0/cancel",
    "get": "https://api.replicate.com/v1/predictions/25s2s4n7rdrgg0cf1httb3myk0"
  }
}

As models can take several seconds or more to run, the output will not be available immediately. To get the final result of the prediction you should either provide a webhook HTTPS URL for us to call when the results are ready, or poll the get a prediction endpoint until it has finished.

All input parameters, output values, and logs are automatically removed after an hour, by default, for predictions created through the API.

Output files are served by replicate.delivery and its subdomains. If you use an allow list of external domains for your assets, add replicate.delivery and *.replicate.delivery to it.

Request path parameters

The name of the user or organization that owns the model.

The name of the model.

Request body

The model’s input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the “API” tab on the model you are running or get the model version and look at its openapi_schema property. For example, stability-ai/sdxl takes prompt as an input.

Files should be passed as HTTP URLs or data URLs.

Use an HTTP URL when:

you have a large file > 256kb
you want to be able to use the file multiple times
you want your prediction metadata to be associable with your input files

Use a data URL when:

you have a small file <= 256kb
you don’t want to upload and host the file somewhere
you don’t need to use the file again (Replicate will not store it)

stream

Request a URL to receive streaming output using server-sent events (SSE).

If the requested model version supports streaming, the returned prediction will have a stream entry in its urls property with an HTTPS URL that you can use to construct an EventSource.

An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.

max_instances (required)

By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter in the prediction request:

start: immediately on prediction start
output: each time a prediction generates an output (note that predictions can generate multiple outputs)
logs: each time log output is generated by a prediction
completed: when the prediction reaches a terminal state (succeeded/canceled/failed)

For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:

{
  "input": {
    "text": "Alice"
  },
  "webhook": "https://example.com/my-webhook",
  "webhook_events_filter": ["start", "completed"]
}

Requests for event types output and logs will be sent at most once every 500ms. If you request start and completed webhooks, then they’ll always be sent regardless of throttling.

Create a deployment

POST https://api.replicate.com/v1/deployments

Create a new deployment:

Example cURL request:

curl -s \
  -X POST \
  -H "Authorization: Bearer <paste-your-token-here>" \
  -H "Content-Type: application/json" \
  -d '{
        "name": "my-app-image-generator",
        "model": "stability-ai/sdxl",
        "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
        "hardware": "gpu-t4",
        "min_instances": 0,
        "max_instances": 3
      }' \
  https://api.replicate.com/v1/deployments

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 1,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "github_url": "https://github.com/acme",
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 1,
      "max_instances": 5
    }
  }
}

Request body

hardware (required)

The SKU for the hardware used to run the model. Possible values can be retrieved from the hardware.list endpoint.

The maximum number of instances for scaling.

min_instances (required)

The minimum number of instances for scaling.

model (required)

The full name of the model that you want to deploy e.g. stability-ai/sdxl.

name (required)

The name of the deployment.

version (required)

The 64-character string ID of the model version that you want to deploy.

Response body

Property	Type	Description
current_release	object
current_release.configuration	object
current_release.configuration.hardware	string	The SKU for the hardware used to run the model.
current_release.configuration.max_instances	integer	The maximum number of instances for scaling.
current_release.configuration.min_instances	integer	The minimum number of instances for scaling.
current_release.created_at	string	The time the release was created.
current_release.created_by	object
current_release.created_by.github_url	string	The GitHub URL of the account that created the release.
current_release.created_by.name	string	The name of the account that created the release.
current_release.created_by.type	string (enum)	The account type of the creator. Can be a user or an organization. Options: organization, user
current_release.created_by.username	string	The username of the account that created the release.
current_release.model	string	The model identifier string in the format of `{model_owner}/{model_name}`.
current_release.number	integer	The release number.
current_release.version	string	The ID of the model version used in the release.
name	string	The name of the deployment.
owner	string	The owner of the deployment.

Get a deployment

GET https://api.replicate.com/v1/deployments/{deployment_owner}/{deployment_name}

Get information about a deployment by name including the current release.

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/deployments/replicate/my-app-image-generator

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 1,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "github_url": "https://github.com/acme",
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 1,
      "max_instances": 5
    }
  }
}

Request path parameters

The name of the user or organization that owns the deployment.

The name of the deployment.

Response body

Property	Type	Description
current_release	object
current_release.configuration	object
current_release.configuration.hardware	string	The SKU for the hardware used to run the model.
current_release.configuration.max_instances	integer	The maximum number of instances for scaling.
current_release.configuration.min_instances	integer	The minimum number of instances for scaling.
current_release.created_at	string	The time the release was created.
current_release.created_by	object
current_release.created_by.github_url	string	The GitHub URL of the account that created the release.
current_release.created_by.name	string	The name of the account that created the release.
current_release.created_by.type	string (enum)	The account type of the creator. Can be a user or an organization. Options: organization, user
current_release.created_by.username	string	The username of the account that created the release.
current_release.model	string	The model identifier string in the format of `{model_owner}/{model_name}`.
current_release.number	integer	The release number.
current_release.version	string	The ID of the model version used in the release.
name	string	The name of the deployment.
owner	string	The owner of the deployment.

List deployments

GET https://api.replicate.com/v1/deployments

Get a list of deployments associated with the current account, including the latest release configuration for each deployment.

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/deployments

The response will be a paginated JSON array of deployment objects, sorted with the most recent deployment first:

{
  "next": "http://api.replicate.com/v1/deployments?cursor=cD0yMDIzLTA2LTA2KzIzJTNBNDAlM0EwOC45NjMwMDAlMkIwMCUzQTAw",
  "previous": null,
  "results": [
    {
      "owner": "replicate",
      "name": "my-app-image-generator",
      "current_release": {
        "number": 1,
        "model": "stability-ai/sdxl",
        "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
        "created_at": "2024-02-15T16:32:57.018467Z",
        "created_by": {
          "type": "organization",
          "username": "acme",
          "name": "Acme Corp, Inc.",
          "github_url": "https://github.com/acme",
        },
        "configuration": {
          "hardware": "gpu-t4",
          "min_instances": 1,
          "max_instances": 5
        }
      }
    }
  ]
}

Response body

Property	Type	Description
next	string	A URL pointing to the next page of deployment objects if any
previous	string	A URL pointing to the previous page of deployment objects if any
results	array	An array containing a page of deployment objects

Update a deployment

PATCH https://api.replicate.com/v1/deployments/{deployment_owner}/{deployment_name}

Update properties of an existing deployment, including hardware, min/max instances, and the deployment’s underlying model version.

Example cURL request:

curl -s \
  -X PATCH \
  -H "Authorization: Bearer <paste-your-token-here>" \
  -H "Content-Type: application/json" \
  -d '{"min_instances": 3, "max_instances": 10}' \
  https://api.replicate.com/v1/deployments/acme/my-app-image-generator

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 2,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "github_url": "https://github.com/acme",
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 3,
      "max_instances": 10
    }
  }
}

Updating any deployment properties will increment the number field of the current_release.

Request path parameters

The name of the user or organization that owns the deployment.

The name of the deployment.

Request body

hardware

The SKU for the hardware used to run the model. Possible values can be retrieved from the hardware.list endpoint.

max_instances

The maximum number of instances for scaling.

min_instances

The minimum number of instances for scaling.

version

The ID of the model version that you want to deploy

Response body

Property	Type	Description
current_release	object
current_release.configuration	object
current_release.configuration.hardware	string	The SKU for the hardware used to run the model.
current_release.configuration.max_instances	integer	The maximum number of instances for scaling.
current_release.configuration.min_instances	integer	The minimum number of instances for scaling.
current_release.created_at	string	The time the release was created.
current_release.created_by	object
current_release.created_by.github_url	string	The GitHub URL of the account that created the release.
current_release.created_by.name	string	The name of the account that created the release.
current_release.created_by.type	string (enum)	The account type of the creator. Can be a user or an organization. Options: organization, user
current_release.created_by.username	string	The username of the account that created the release.
current_release.model	string	The model identifier string in the format of `{model_owner}/{model_name}`.
current_release.number	integer	The release number.
current_release.version	string	The ID of the model version used in the release.
name	string	The name of the deployment.
owner	string	The owner of the deployment.

Delete a deployment

DELETE https://api.replicate.com/v1/deployments/{deployment_owner}/{deployment_name}

Delete a deployment

Deployment deletion has some restrictions:

You can only delete deployments that have been offline and unused for at least 15 minutes.

Example cURL request:

curl -s -X DELETE \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/deployments/acme/my-app-image-generator

The response will be an empty 204, indicating the deployment has been deleted.

Request path parameters

The name of the user or organization that owns the deployment.

The name of the deployment.

Create a prediction using a deployment

POST https://api.replicate.com/v1/deployments/{deployment_owner}/{deployment_name}/predictions

Start a new prediction for a deployment of a model using inputs you provide.

Example request body:

{
  "input": {
    "text": "Alice"
  }
}

Example cURL request:

curl -s -X POST \
  -d '{"input": {"text": "Alice"}}' \
  -H "Authorization: Bearer <paste-your-token-here>" \
  -H 'Content-Type: application/json' \
  "https://api.replicate.com/v1/deployments/replicate/hello-world/predictions"

The response will be the prediction object:

{
  "id": "86b6trbv99rgp0cf1h886f69ew",
  "model": "replicate/hello-world",
  "version": "dp-8e43d61c333b5ddc7a921130bc3ab3ea",
  "input": {
    "text": "Alice"
  },
  "logs": "",
  "error": null,
  "status": "starting",
  "created_at": "2024-04-23T18:55:52.138Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/86b6trbv99rgp0cf1h886f69ew/cancel",
    "get": "https://api.replicate.com/v1/predictions/86b6trbv99rgp0cf1h886f69ew"
  }
}

As models can take several seconds or more to run, the output will not be available immediately. To get the final result of the prediction you should either provide a webhook HTTPS URL for us to call when the results are ready, or poll the get a prediction endpoint until it has finished.

Input and output (including any files) will be automatically deleted after an hour, so you must save a copy of any files in the output if you’d like to continue using them.

Output files are served by replicate.delivery and its subdomains. If you use an allow list of external domains for your assets, add replicate.delivery and *.replicate.delivery to it.

Request path parameters

The name of the user or organization that owns the deployment.

The name of the deployment.

Request body

The model’s input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the “API” tab on the model you are running or get the model version and look at its openapi_schema property. For example, stability-ai/sdxl takes prompt as an input.

Files should be passed as HTTP URLs or data URLs.

Use an HTTP URL when:

you have a large file > 256kb
you want to be able to use the file multiple times
you want your prediction metadata to be associable with your input files

Use a data URL when:

you have a small file <= 256kb
you don’t want to upload and host the file somewhere
you don’t need to use the file again (Replicate will not store it)

stream

Request a URL to receive streaming output using server-sent events (SSE).

If the requested model version supports streaming, the returned prediction will have a stream entry in its urls property with an HTTPS URL that you can use to construct an EventSource.

An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.

By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter in the prediction request:

start: immediately on prediction start
output: each time a prediction generates an output (note that predictions can generate multiple outputs)
logs: each time log output is generated by a prediction
completed: when the prediction reaches a terminal state (succeeded/canceled/failed)

For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:

{
  "input": {
    "text": "Alice"
  },
  "webhook": "https://example.com/my-webhook",
  "webhook_events_filter": ["start", "completed"]
}

Requests for event types output and logs will be sent at most once every 500ms. If you request start and completed webhooks, then they’ll always be sent regardless of throttling.

List available hardware for models

GET https://api.replicate.com/v1/hardware

Example cURL request:

curl -s \
  -H "Authorization: Bearer <paste-your-token-here>" \
  https://api.replicate.com/v1/hardware

The response will be a JSON array of hardware objects:

[
    {"name": "CPU", "sku": "cpu"},
    {"name": "Nvidia T4 GPU", "sku": "gpu-t4"},
    {"name": "Nvidia A40 GPU", "sku": "gpu-a40-small"},
    {"name": "Nvidia A40 (Large) GPU", "sku": "gpu-a40-large"},
]

Create a training

POST https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions/{version_id}/trainings

Start a new training of the model version you specify.

Example request body:

{
  "destination": "{new_owner}/{new_name}",
  "input": {
    "train_data": "https://example.com/my-input-images.zip",
  },
  "webhook": "https://example.com/my-webhook",
}

Example cURL request:

curl -s -X POST \
  -d '{"destination": "{new_owner}/{new_name}", "input": {"input_images": "https://example.com/my-input-images.zip"}}' \
  -H "Authorization: Bearer <paste-your-token-here>" \
  -H 'Content-Type: application/json' \
  https://api.replicate.com/v1/models/stability-ai/sdxl/versions/da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf/trainings

The response will be the training object:

{
  "id": "zz4ibbonubfz7carwiefibzgga",
  "model": "stability-ai/sdxl",
  "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
  "input": {
    "input_images": "https://example.com/my-input-images.zip"
  },
  "logs": "",
  "error": null,
  "status": "starting",
  "created_at": "2023-09-08T16:32:56.990893084Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/zz4ibbonubfz7carwiefibzgga/cancel",
    "get": "https://api.replicate.com/v1/predictions/zz4ibbonubfz7carwiefibzgga"
  }
}

As models can take several minutes or more to train, the result will not be available immediately. To get the final result of the training you should either provide a webhook HTTPS URL for us to call when the results are ready, or poll the get a training endpoint until it has finished.

When a training completes, it creates a new version of the model at the specified destination.

To find some models to train on, check out the trainable language models collection.

Request path parameters

The name of the user or organization that owns the model.

The name of the model.

version_id (required)

The ID of the version.

Request body

destination (required)

A string representing the desired model to push to in the format {destination_model_owner}/{destination_model_name}. This should be an existing model owned by the user or organization making the API request. If the destination is invalid, the server will return an appropriate 4XX response.

An object containing inputs to the Cog model’s train() function.

An HTTPS URL for receiving a webhook when the training completes. The webhook will be a POST request where the request body is the same as the response body of the get training operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.