Small fixes

2026-04-04 14:17:28 +00:00 · 2024-02-06 06:26:27 -08:00 · 2024-02-06 06:26:27 -08:00 · 8a6d9abb41
commit 8a6d9abb41
parent 2a1063eff5
3 changed files with 6 additions and 2 deletions
--- a/docs/04
+++ b/docs/04
@ -47,6 +47,10 @@ Examples:
 * **no_flash_attn**: Disables flash attention. Otherwise, it is automatically used as long as the library is installed.
 * **cache_8bit**: Create a 8-bit precision cache instead of a 16-bit one. This saves VRAM but increases perplexity (I don't know by how much).

+### ExLlamav2
+
+The same as ExLlamav2_HF but using the internal samplers of ExLlamav2 instead of the ones in the Transformers library.
+
 ### AutoGPTQ

 Loads: GPTQ models.
--- a/docs/What
+++ b/docs/What
@ -6,6 +6,7 @@
 | llama.cpp      |       ❌       |           ❌            |       ❌       |          ❌          |    use llamacpp_HF    |
 | llamacpp_HF    |       ❌       |           ❌            |       ❌       |          ❌          |           ✅          |
 | ExLlamav2_HF   |       ✅       |           ✅            |       ❌       |          ❌          |           ✅          |
+| ExLlamav2      |       ✅       |           ✅            |       ❌       |          ❌          |   use ExLlamav2_HF    |
 | AutoGPTQ       |       ✅       |           ❌            |       ❌       |          ✅          |           ✅          |
 | AutoAWQ        |       ?        |           ❌            |       ?        |          ?           |           ✅          |
 | GPTQ-for-LLaMa |       ✅\*\*   |           ✅\*\*\*      |       ✅       |          ✅          |           ✅          |