text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2025-12-06 07:12:10 +01:00

Author	SHA1	Message	Date
oobabooga	1972479610	Add the TP option to exllamav3_HF	2025-08-19 06:48:22 -07:00
oobabooga	dbabe67e77	ExLlamaV3: Enable the --enable-tp option, add a --tp-backend option	2025-08-17 13:19:11 -07:00
oobabooga	2238302b49	ExLlamaV3: Add speculative decoding	2025-08-12 08:50:45 -07:00
oobabooga	d86b0ec010	Add multimodal support (llama.cpp) (#7027 )	2025-08-10 01:27:25 -03:00
oobabooga	544c3a7c9f	Polish the new exllamav3 loader	2025-08-08 21:15:53 -07:00
Katehuuh	88127f46c1	Add multimodal support (ExLlamaV3) (#7174 )	2025-08-08 23:31:16 -03:00
oobabooga	498778b8ac	Add a new 'Reasoning effort' UI element	2025-08-05 15:19:11 -07:00
oobabooga	1d1b20bd77	Remove the --torch-compile option (it doesn't do anything currently)	2025-07-11 10:51:23 -07:00
oobabooga	6c2bdda0f0	Transformers loader: replace `use_flash_attention_2`/`use_eager_attention` with a unified `attn_implementation` Closes #7107	2025-07-09 18:39:37 -07:00
oobabooga	9ec46b8c44	Remove the HQQ loader (HQQ models can be loaded through Transformers)	2025-05-19 09:23:24 -07:00
oobabooga	93e1850a2c	Only show the VRAM info for llama.cpp	2025-05-15 21:42:15 -07:00
oobabooga	3fa1a899ae	UI: Fix gpu-layers being ignored (closes #6973 )	2025-05-13 12:07:59 -07:00
oobabooga	a2ab42d390	UI: Remove the exllamav2 info message	2025-05-08 08:00:38 -07:00
oobabooga	c4f36db0d8	llama.cpp: remove tfs (it doesn't get used)	2025-05-06 08:41:13 -07:00
oobabooga	1927afe894	Fix top_n_sigma not showing for llama.cpp	2025-05-06 08:18:49 -07:00
oobabooga	d10bded7f8	UI: Add an `enable_thinking` option to enable/disable Qwen3 thinking	2025-04-28 22:37:01 -07:00
oobabooga	4a32e1f80c	UI: show draft_max for ExLlamaV2	2025-04-26 18:01:44 -07:00
oobabooga	d4017fbb6d	ExLlamaV3: Add kv cache quantization (#6903 )	2025-04-25 21:32:00 -03:00
oobabooga	d4b1e31c49	Use `--ctx-size` to specify the context size for all loaders Old flags are still recognized as alternatives.	2025-04-25 16:59:03 -07:00
oobabooga	98f4c694b9	llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server	2025-04-25 07:32:51 -07:00
oobabooga	ae1fe87365	ExLlamaV2: Add speculative decoding (#6899 )	2025-04-25 00:11:04 -03:00
oobabooga	e99c20bcb0	llama.cpp: Add speculative decoding (#6891 )	2025-04-23 20:10:16 -03:00
oobabooga	15989c2ed8	Make llama.cpp the default loader	2025-04-21 16:36:35 -07:00
oobabooga	ae02ffc605	Refactor the transformers loader (#6859 )	2025-04-20 13:33:47 -03:00
oobabooga	8144e1031e	Remove deprecated command-line flags	2025-04-18 06:02:28 -07:00
oobabooga	ae54d8faaa	New llama.cpp loader (#6846 )	2025-04-18 09:59:37 -03:00
oobabooga	2c2d453c8c	Revert "Use ExLlamaV2 (instead of the HF one) for EXL2 models for now" This reverts commit `0ef1b8f8b4`.	2025-04-17 21:31:32 -07:00
oobabooga	0ef1b8f8b4	Use ExLlamaV2 (instead of the HF one) for EXL2 models for now It doesn't seem to have the "OverflowError" bug	2025-04-17 05:47:40 -07:00
oobabooga	8b8d39ec4e	Add ExLlamaV3 support (#6832 )	2025-04-09 00:07:08 -03:00
oobabooga	5bcd2d7ad0	Add the top N-sigma sampler (#6796 )	2025-03-14 16:45:11 -03:00
oobabooga	83c426e96b	Organize internals (#6646 )	2025-01-10 18:04:32 -03:00
oobabooga	7157257c3f	Remove the AutoGPTQ loader (#6641 )	2025-01-08 19:28:56 -03:00
oobabooga	c0f600c887	Add a --torch-compile flag for transformers	2025-01-05 05:47:00 -08:00
oobabooga	11af199aff	Add a "Static KV cache" option for transformers	2025-01-04 17:52:57 -08:00
oobabooga	3967520e71	Connect XTC, DRY, smoothing_factor, and dynatemp to ExLlamaV2 loader (non-HF)	2025-01-04 16:25:06 -08:00
oobabooga	39a5c9a49c	UI organization (#6618 )	2024-12-29 11:16:17 -03:00
Diner Burger	addad3c63e	Allow more granular KV cache settings (#6561 )	2024-12-17 17:43:48 -03:00
oobabooga	93c250b9b6	Add a UI element for enable_tp	2024-10-01 11:16:15 -07:00
Philipp Emanuel Weidmann	301375834e	Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition (#6335 )	2024-09-27 22:50:12 -03:00
oobabooga	e6181e834a	Remove AutoAWQ as a standalone loader (it works better through transformers)	2024-07-23 15:31:17 -07:00
oobabooga	aa809e420e	Bump llama-cpp-python to 0.2.83, add back tensorcore wheels Also add back the progress bar patch	2024-07-22 18:05:11 -07:00
oobabooga	11bbf71aa5	Bump back llama-cpp-python (#6257 )	2024-07-22 16:19:41 -03:00
oobabooga	0f53a736c1	Revert the llama-cpp-python update	2024-07-22 12:02:25 -07:00
oobabooga	a687f950ba	Remove the tensorcores llama.cpp wheels They are not faster than the default wheels anymore and they use a lot of space.	2024-07-22 11:54:35 -07:00
oobabooga	e436d69e2b	Add --no_xformers and --no_sdpa flags for ExllamaV2	2024-07-11 15:47:37 -07:00
GralchemOz	8a39f579d8	transformers: Add eager attention option to make Gemma-2 work properly (#6188 )	2024-07-01 12:08:08 -03:00
oobabooga	4ea260098f	llama.cpp: add 4-bit/8-bit kv cache options	2024-06-29 09:10:33 -07:00
oobabooga	577a8cd3ee	Add TensorRT-LLM support (#5715 )	2024-06-24 02:30:03 -03:00
oobabooga	536f8d58d4	Do not expose alpha_value to llama.cpp & rope_freq_base to transformers To avoid confusion	2024-06-23 22:09:24 -07:00
Forkoz	1d79aa67cf	Fix flash-attn UI parameter to actually store true. (#6076 )	2024-06-13 00:34:54 -03:00

1 2 3 4

155 commits