Readme
BERT Sentiment Analysis on Replicate
This repository wraps the Hugging Face DistilBERT model (distilbert-base-uncased-finetuned-sst-2-english
) using Cog for easy deployment on Replicate. The model performs sentiment analysis on input text and supports evaluation on entire batches of texts for increased efficiency and lower compute costs.
Contents
- predict.py
Contains the Predictor class that loads the model and performs inference. The predictor is now designed to process a batch of text inputs in one API call. - cog.yaml
Specifies the build and runtime environment for Cog. - requirements.txt
Lists the Python dependencies (for local development/testing).
Batched Evaluation
Overview
To reduce overhead and cost on Replicate, the model was modified to support batched evaluation. Instead of processing one text per API call, you can now pass an entire batch of texts at once. This minimizes container start-up overhead and makes better use of your available compute resources.
Key Details:
- Input Format:
Due to limitations with Cog’s input deserialization, the input must be provided as a JSON-formatted string representing a list of texts.
Example:
CLI
bash
cog predict -i text='["I love apple", "I hate apple"]'
Python API
output = replicate.run("halstonblim/distilbert-base-uncased-finetuned-sst-2-english:3a88ff44062350c53b0c1f00b9a91475643ecab79bd0c2a292cf2a8cf2cfb897", input={"texts": '["I love apple","I hate apple"]'} )
print(output)
The output should be a dictionary containing the list of predicted labels and probabilities
{'confidences': [0.9998000264167786, 0.9993000030517578], 'predicted_labels': [1, 0]}
- Passing the Input:
When calling the model via the Cog command-line interface, you can supply the batch input using the proper JSON syntax. For example:
bash
cog predict -i texts='["I hate apple", "I love apple"]'
Alternatively, you can store the input in a file (e.g., input.json
) with the above JSON array and pass it using:
bash
cog predict -i texts=@input.json
- Output Structure:
The Predictor returns a dictionary with two lists:
"predicted_labels"
: Contains the predicted sentiment labels (0 for negative, 1 for positive)."confidences"
: Contains the confidence (probability) associated with the prediction for each input.
How It Works Internally
-
Manual Batching:
Insidepredict.py
, the predictor accepts a JSON-formatted string. Since Cog does not natively support a list input type, the predictor converts that JSON string into a Python list. It then processes the texts in batches (default batch size is 32) by tokenizing them together, running inference in a vectorized manner, and finally splitting the results. -
Efficiency Gains:
- Fewer API Calls:
Passing a whole batch in a single API call avoids the overhead of starting up multiple container instances. -
Optimized GPU Utilization:
By processing many texts at once, the GPU can be used more efficiently, reducing the overall cost per inference. -
Data Conversion:
When running in environments like WSL on Windows, the file input is automatically wrapped as a data URI. The Predictor code decodes the base64 data URI to obtain the JSON list before evaluation.
Converting a Python List to JSON
If you need to convert a Python list of strings into a proper JSON-formatted string (for testing or to prepare a file), you can use Python’s built-in json
module:
import json
my_list = ["I hate apple", "I love apple"]
json_string = json.dumps(my_list)
print(json_string)
This will output:
["I hate apple", "I love apple"]
You can then use this JSON string directly with the Cog CLI or store it in a file.
Running Locally
You can test the batching functionality from the command line or within the Cog container. For example, using the CLI:
cog predict -i texts='["I hate apple", "I love apple"]'
The output should be similar to:
{
"predicted_labels": [0, 1],
"confidences": [0.9923, 0.9876]
}
This confirms that both texts were processed in a single batch.
Conclusion
This repository now supports efficient batch evaluation of sentiment analysis inputs. By passing a JSON-formatted list of texts, you can evaluate entire batches in one API call—reducing overhead and lowering compute costs on Replicate.
For further details on deploying models with Cog and on Replicate, refer to the official documentation.