Run time and cost

This model costs approximately $0.00098 to run on Replicate, or 1020 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 1 seconds.

Readme

Qwen2-1.5B-Instruct on Replicate

This Replicate model provides access to the Qwen2-1.5B-Instruct model, part of the Qwen2 language model series. It offers three variants:

Qwen/Qwen2-1.5B-Instruct: Full precision model
Qwen/Qwen2-1.5B-Instruct-GPTQ-Int8: 8-bit quantized model
Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4: 4-bit quantized model

Introduction

Qwen2 is the latest series of Qwen large language models, offering both pretrained and instruction-tuned models in five sizes: 0.5B, 1.5B, 7B, 57B-A14B, and 72B. This Replicate implementation focuses on the instruction-tuned 1.5B Qwen2 model.

Qwen2 demonstrates competitive performance against state-of-the-art open-source and proprietary models across various benchmarks, including language understanding, generation, multilingual capability, coding, mathematics, and reasoning.

For more details about Qwen2, visit:

Model Details

Qwen2 is based on the Transformer architecture and incorporates: - SwiGLU activation - Attention QKV bias - Group query attention - Improved tokenizer for multiple natural languages and code

Training Details

The model underwent pretraining with a large dataset, followed by post-training using both supervised fine-tuning and direct preference optimization.

Quickstart

To use this Replicate implementation:

Visit the Replicate model page.
Use the web interface or API to run a prediction with your desired parameters.

For local testing or development:

Clone the repository: sh git clone -b Qwen2-1.5B-Instruct https://github.com/zsxkib/cog-qwen-2.git cd cog-qwen-2
Run a prediction using Cog: sh cog predict \ -I 'top_k=1' \ -I 'top_p=1' \ -I 'prompt="Tell me a funny joke about cowboys in the style of Yoda from Star Wars"' \ -I 'model_type="Qwen2-1.5B-Instruct"' \ -I 'temperature=1' \ -I 'system_prompt="You are a funny and helpful assistant."' \ -I 'max_new_tokens=512' \ -I 'repetition_penalty=1'

Evaluation

Performance comparison between Qwen2-1.5B-Instruct and Qwen1.5-1.8B-Chat:

Dataset	Qwen1.5-0.5B-Chat	Qwen2-0.5B-Instruct	Qwen1.5-1.8B-Chat	Qwen2-1.5B-Instruct
MMLU	35.0	37.9	43.7	52.4
HumanEval	9.1	17.1	25.0	37.8
GSM8K	11.3	40.1	35.3	61.6
C-Eval	37.2	45.2	55.3	63.8
IFEval (Prompt Strict-Acc.)	14.6	20.0	16.8	29.0

Citation

If you find the Qwen2 model helpful in your work, please cite:

@article{qwen2,
  title={Qwen2 Technical Report},
  year={2024}
}

License

The Qwen2 model is licensed under the Apache 2.0 License.

Credits and Support

The Qwen2 model was developed by the Qwen team.
This Replicate implementation was created by @zsakib_.
For issues related to the Replicate implementation, please use the GitHub issue tracker.
For questions about the underlying Qwen2 model, refer to the official Qwen repository.