Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

BERT Sentiment Analysis on Replicate

This repository wraps the Hugging Face DistilBERT model (distilbert-base-uncased-finetuned-sst-2-english) using Cog for easy deployment on Replicate. The model performs sentiment analysis on input text and supports evaluation on entire batches of texts for increased efficiency and lower compute costs.

predict.py
Contains the Predictor class that loads the model and performs inference. The predictor is now designed to process a batch of text inputs in one API call.
cog.yaml
Specifies the build and runtime environment for Cog.
requirements.txt
Lists the Python dependencies (for local development/testing).

Batched Evaluation

Overview

To reduce overhead and cost on Replicate, the model was modified to support batched evaluation. Instead of processing one text per API call, you can now pass an entire batch of texts at once. This minimizes container start-up overhead and makes better use of your available compute resources.

Key Details:

Input Format:
Due to limitations with Cog’s input deserialization, the input must be provided as a JSON-formatted string representing a list of texts.

Example:

CLI

bash cog predict -i text='["I love apple", "I hate apple"]'

Python API

output = replicate.run("halstonblim/distilbert-base-uncased-finetuned-sst-2-english:3a88ff44062350c53b0c1f00b9a91475643ecab79bd0c2a292cf2a8cf2cfb897",     input={"texts": '["I love apple","I hate apple"]'} )
print(output)

The output should be a dictionary containing the list of predicted labels and probabilities

{'confidences': [0.9998000264167786, 0.9993000030517578], 'predicted_labels': [1, 0]}

Passing the Input:

When calling the model via the Cog command-line interface, you can supply the batch input using the proper JSON syntax. For example:

bash cog predict -i texts='["I hate apple", "I love apple"]'

Alternatively, you can store the input in a file (e.g., input.json) with the above JSON array and pass it using:

bash cog predict -i texts=@input.json

Output Structure:

The Predictor returns a dictionary with two lists:

"predicted_labels": Contains the predicted sentiment labels (0 for negative, 1 for positive).
"confidences": Contains the confidence (probability) associated with the prediction for each input.

How It Works Internally

Manual Batching:
Inside predict.py, the predictor accepts a JSON-formatted string. Since Cog does not natively support a list input type, the predictor converts that JSON string into a Python list. It then processes the texts in batches (default batch size is 32) by tokenizing them together, running inference in a vectorized manner, and finally splitting the results.
Efficiency Gains:
Fewer API Calls:
Passing a whole batch in a single API call avoids the overhead of starting up multiple container instances.
Optimized GPU Utilization:
By processing many texts at once, the GPU can be used more efficiently, reducing the overall cost per inference.
Data Conversion:
When running in environments like WSL on Windows, the file input is automatically wrapped as a data URI. The Predictor code decodes the base64 data URI to obtain the JSON list before evaluation.

Converting a Python List to JSON

If you need to convert a Python list of strings into a proper JSON-formatted string (for testing or to prepare a file), you can use Python’s built-in json module:

import json

my_list = ["I hate apple", "I love apple"]
json_string = json.dumps(my_list)
print(json_string)

This will output:

["I hate apple", "I love apple"]

You can then use this JSON string directly with the Cog CLI or store it in a file.

Running Locally

You can test the batching functionality from the command line or within the Cog container. For example, using the CLI:

cog predict -i texts='["I hate apple", "I love apple"]'

The output should be similar to:

{
  "predicted_labels": [0, 1],
  "confidences": [0.9923, 0.9876]
}

This confirms that both texts were processed in a single batch.

Conclusion

This repository now supports efficient batch evaluation of sentiment analysis inputs. By passing a JSON-formatted list of texts, you can evaluate entire batches in one API call—reducing overhead and lowering compute costs on Replicate.

For further details on deploying models with Cog and on Replicate, refer to the official documentation.