text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2026-03-18 03:14:39 +01:00

Author	SHA1	Message	Date
oobabooga	7f485274eb	Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation - Use config.eos_token_id_list for all EOS tokens as stop conditions (fixes models like Llama-3 that define multiple EOS token IDs) - Load vision/draft models before main model so autosplit accounts for their VRAM usage - Fix loss computation in ExLlamav3_HF: use cache across chunks so sequences longer than 2048 tokens get correct perplexity values	2026-03-09 23:56:38 -03:00
oobabooga	d8af0505a8	ExLlamav3_HF: Optimize prefill and fix CFG cache initialization	2026-03-04 11:09:58 -08:00
oobabooga	a156ebbf76	Lint	2025-10-15 13:15:01 -07:00
oobabooga	c871d9cdbd	Revert "Same as `7f06aec3a1` but for exllamav3_hf" This reverts commit `deb37b821b`.	2025-10-15 13:05:41 -07:00
oobabooga	deb37b821b	Same as `7f06aec3a1` but for exllamav3_hf	2025-10-09 13:02:38 -07:00
oobabooga	9e9ab39892	Make exllamav3_hf and exllamav2_hf functional again	2025-09-17 12:29:22 -07:00
oobabooga	1972479610	Add the TP option to exllamav3_HF	2025-08-19 06:48:22 -07:00
oobabooga	219f0a7731	Fix exllamav3_hf models failing to unload (closes #7031 )	2025-05-30 12:05:49 -07:00
oobabooga	f3da45f65d	ExLlamaV3_HF: Change max_chunk_size to 256	2025-05-04 20:37:15 -07:00
oobabooga	ee0592473c	Fix ExLlamaV3_HF leaking memory (attempt)	2025-04-27 21:04:02 -07:00
oobabooga	d4017fbb6d	ExLlamaV3: Add kv cache quantization (#6903 )	2025-04-25 21:32:00 -03:00
oobabooga	d4b1e31c49	Use `--ctx-size` to specify the context size for all loaders Old flags are still recognized as alternatives.	2025-04-25 16:59:03 -07:00
oobabooga	b3bf7a885d	Fix ExLlamaV2_HF and ExLlamaV3_HF after `ae02ffc605`	2025-04-20 11:32:48 -07:00
oobabooga	1c4a2c9a71	Make exllamav3 safer as well	2025-04-18 06:17:58 -07:00
oobabooga	8b8d39ec4e	Add ExLlamaV3 support (#6832 )	2025-04-09 00:07:08 -03:00

15 commits