text-generation-webui/docs/ExLlama.md

# ExLlama

## About

ExLlama is an extremely optimized GPTQ backend for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.

# Installation:

1) Clone the ExLlama repository into your `repositories` folder:

```
cd repositories
git clone https://github.com/turboderp/exllama
```

2) Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama
Add ExLlama support (#2444) 2023-06-17 01:35:38 +02:00			`# ExLlama`

			`## About`

			`ExLlama is an extremely optimized GPTQ backend for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.`

			`# Installation:`

			1) Clone the ExLlama repository into your `repositories` folder:

			```
			`cd repositories`
			`git clone https://github.com/turboderp/exllama`
			```

			`2) Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama`