oobabooga
7f485274eb
Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation
...
- Use config.eos_token_id_list for all EOS tokens as stop conditions
(fixes models like Llama-3 that define multiple EOS token IDs)
- Load vision/draft models before main model so autosplit accounts
for their VRAM usage
- Fix loss computation in ExLlamav3_HF: use cache across chunks so
sequences longer than 2048 tokens get correct perplexity values
2026-03-09 23:56:38 -03:00
oobabooga
d8af0505a8
ExLlamav3_HF: Optimize prefill and fix CFG cache initialization
2026-03-04 11:09:58 -08:00
oobabooga
a156ebbf76
Lint
2025-10-15 13:15:01 -07:00
oobabooga
c871d9cdbd
Revert "Same as 7f06aec3a1 but for exllamav3_hf"
...
This reverts commit deb37b821b .
2025-10-15 13:05:41 -07:00
oobabooga
deb37b821b
Same as 7f06aec3a1 but for exllamav3_hf
2025-10-09 13:02:38 -07:00
oobabooga
9e9ab39892
Make exllamav3_hf and exllamav2_hf functional again
2025-09-17 12:29:22 -07:00
oobabooga
1972479610
Add the TP option to exllamav3_HF
2025-08-19 06:48:22 -07:00
oobabooga
219f0a7731
Fix exllamav3_hf models failing to unload ( closes #7031 )
2025-05-30 12:05:49 -07:00
oobabooga
f3da45f65d
ExLlamaV3_HF: Change max_chunk_size to 256
2025-05-04 20:37:15 -07:00
oobabooga
ee0592473c
Fix ExLlamaV3_HF leaking memory (attempt)
2025-04-27 21:04:02 -07:00
oobabooga
d4017fbb6d
ExLlamaV3: Add kv cache quantization ( #6903 )
2025-04-25 21:32:00 -03:00
oobabooga
d4b1e31c49
Use --ctx-size to specify the context size for all loaders
...
Old flags are still recognized as alternatives.
2025-04-25 16:59:03 -07:00
oobabooga
b3bf7a885d
Fix ExLlamaV2_HF and ExLlamaV3_HF after ae02ffc605
2025-04-20 11:32:48 -07:00
oobabooga
1c4a2c9a71
Make exllamav3 safer as well
2025-04-18 06:17:58 -07:00
oobabooga
8b8d39ec4e
Add ExLlamaV3 support ( #6832 )
2025-04-09 00:07:08 -03:00