ibm-granite
/
granite-3.2-8b-instruct
- Public
- 334.5K runs
-
Priced per token
- License
New model available: Try out Granite-3.3-8b-instruct View the new model
Prediction
ibm-granite/granite-3.2-8b-instructOfficial modelID7kqz6aczz5rme0cns1rrsj94yrStatusSucceededSourceWebTotal durationCreatedInput
- top_p
- 0.9
- prompt
- How is perplexity measured for LLMs and why is it useful?
- max_tokens
- 512
- min_tokens
- 0
- temperature
- 0.6
- system_prompt
- You are a helpful assistant.
- presence_penalty
- 0
- frequency_penalty
- 0
Output
Perplexity is a common metric used to evaluate the performance of language models, including large language models (LLMs). It's a measure of how well a model predicts a sample. Perplexity is calculated based on the concept of cross-entropy. In simpler terms, it's a way to measure how surprised the model is by the test data. The lower the perplexity, the less surprised the model is, and the better it predicts the data. Here's a simple breakdown of how it's calculated: 1. The model is given a test set of sentences. 2. For each word in the sentence, the model calculates the probability of that word given all the previous words. 3. The perplexity is then the inverse probability of the entire sentence, averaged over the whole test set. The formula for perplexity (PP) is: PP(W) = exp(-1/N * Σ log P(wi|w1...wi-1)) Where: - W is the test set of sentences, - N is the number of words in the test set, - wi is each word in the test set, - P(wi|w1...wi-1) is the probability assigned by the model to word wi given the previous words. Perplexity is useful for several reasons: 1. **Model Comparison**: It provides a standard way to compare different models. A lower perplexity score generally indicates a better model. 2. **Model Improvement**: It helps in identifying areas where the model is struggling. If a certain type of sentence or vocabulary consistently results in high perplexity, it indicates a weakness in the model that can be addressed through further training or adjustments. 3. **Understanding Model Behavior**: It gives insights into how the model understands language. A lower perplexity suggests the model has a better grasp of the language's structure and usage. 4. **Evaluation of Unseen Data**: While it's trained on a specific corpus, perplexity can be calculated on unseen data to evaluate the model's generalization capability. However, it's important to note that while perplexity is a widely used metric, it's not without its limitations. For instance, it doesn't directly correlate with human judgment of fluency or coherence, andGenerated inInput tokens22Output tokens510Tokens per second140.55 tokens / secondTime to first token
Want to make some of these yourself?
Run this model