glavin001/exllama-airoboros-7b-gpt4-1.4-gptq – Run with an API on Replicate

glavin001 / exllama-airoboros-7b-gpt4-1.4-gptq

Test out fast inference with ExLlama and 4bit quantization!

Public
1.7K runs

Run with an API

Playground API Examples README Versions

Model: https://huggingface.co/TheBloke/airoboros-7B-gpt4-1.4-GPTQ

Fast inference thanks to https://github.com/turboderp/exllama