zsxkib / qwen2-1.5b-instruct

Qwen 2: A 1.5 billion parameter language model from Alibaba Cloud, fine tuned for chat completions

  • Public
  • 68 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 1 seconds.

Readme

Qwen2-1.5B-Instruct on Replicate

This Replicate model provides access to the Qwen2-1.5B-Instruct model, part of the Qwen2 language model series. It offers three variants:

  • Qwen/Qwen2-1.5B-Instruct: Full precision model
  • Qwen/Qwen2-1.5B-Instruct-GPTQ-Int8: 8-bit quantized model
  • Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4: 4-bit quantized model

Introduction

Qwen2 is the latest series of Qwen large language models, offering both pretrained and instruction-tuned models in five sizes: 0.5B, 1.5B, 7B, 57B-A14B, and 72B. This Replicate implementation focuses on the instruction-tuned 1.5B Qwen2 model.

Qwen2 demonstrates competitive performance against state-of-the-art open-source and proprietary models across various benchmarks, including language understanding, generation, multilingual capability, coding, mathematics, and reasoning.

For more details about Qwen2, visit:

Model Details

Qwen2 is based on the Transformer architecture and incorporates: - SwiGLU activation - Attention QKV bias - Group query attention - Improved tokenizer for multiple natural languages and code

Training Details

The model underwent pretraining with a large dataset, followed by post-training using both supervised fine-tuning and direct preference optimization.

Quickstart

To use this Replicate implementation:

  1. Visit the Replicate model page.

  2. Use the web interface or API to run a prediction with your desired parameters.

For local testing or development:

  1. Clone the repository: sh git clone -b Qwen2-1.5B-Instruct https://github.com/zsxkib/cog-qwen-2.git cd cog-qwen-2

  2. Run a prediction using Cog: sh cog predict \ -I 'top_k=1' \ -I 'top_p=1' \ -I 'prompt="Tell me a funny joke about cowboys in the style of Yoda from Star Wars"' \ -I 'model_type="Qwen2-1.5B-Instruct"' \ -I 'temperature=1' \ -I 'system_prompt="You are a funny and helpful assistant."' \ -I 'max_new_tokens=512' \ -I 'repetition_penalty=1'

Evaluation

Performance comparison between Qwen2-1.5B-Instruct and Qwen1.5-1.8B-Chat:

Dataset Qwen1.5-0.5B-Chat Qwen2-0.5B-Instruct Qwen1.5-1.8B-Chat Qwen2-1.5B-Instruct
MMLU 35.0 37.9 43.7 52.4
HumanEval 9.1 17.1 25.0 37.8
GSM8K 11.3 40.1 35.3 61.6
C-Eval 37.2 45.2 55.3 63.8
IFEval (Prompt Strict-Acc.) 14.6 20.0 16.8 29.0

Citation

If you find the Qwen2 model helpful in your work, please cite:

@article{qwen2,
  title={Qwen2 Technical Report},
  year={2024}
}

License

The Qwen2 model is licensed under the Apache 2.0 License.

Credits and Support