text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2026-03-18 19:34:39 +01:00

Author	SHA1	Message	Date
oobabooga	80d0c03bab	llama.cpp: Change the default `--fit-target` from 1024 to 512	2026-03-15 09:29:25 -07:00
oobabooga	f0c16813ef	Remove the rope scaling parameters Now models have 131k+ context length. The parameters can still be passed to llama.cpp through --extra-flags.	2026-03-14 19:43:25 -07:00
oobabooga	2d3a3794c9	Add a Top-P preset, make it the new default, clean up the built-in presets	2026-03-14 19:22:12 -07:00
oobabooga	4ae2bd86e2	Change the default ctx-size to 0 (auto) for llama.cpp	2026-03-14 15:30:01 -07:00
oobabooga	4b6c9db1c9	UI: Fix stale tool_sequence after edit and chat-instruct tool rendering	2026-03-12 13:12:18 -03:00
oobabooga	cf9ad8eafe	Initial tool-calling support in the UI	2026-03-12 01:16:19 -03:00
oobabooga	307c085d1b	Minor warning change	2026-03-09 21:44:53 -07:00
oobabooga	c604ca66de	Update the --multi-user warning	2026-03-09 21:36:04 -07:00
oobabooga	40f1837b42	README: Minor updates	2026-03-08 08:38:29 -07:00
oobabooga	f5acf55207	Add --chat-template-file flag to override the default instruction template for API requests Matches llama.cpp's flag name. Supports .jinja, .jinja2, and .yaml files. Priority: per-request params > --chat-template-file > model's built-in template.	2026-03-06 14:04:16 -03:00
oobabooga	66fb79fe15	llama.cpp: Add --fit-target param	2026-03-06 01:55:48 -03:00
oobabooga	e81a47f708	Improve the API generation defaults --help message	2026-03-05 20:41:45 -08:00
oobabooga	27bcc45c18	API: Add command-line flags to override default generation parameters	2026-03-06 01:36:45 -03:00
oobabooga	e2548f69a9	Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data If --user-data-dir is not set, auto-detect: use ../user_data when ./user_data doesn't exist, making it easy to share user data across portable builds by placing it one folder up.	2026-03-05 19:31:10 -08:00
oobabooga	f52d9336e5	TensorRT-LLM: Migrate from ModelRunner to LLM API, add concurrent API request support	2026-03-05 18:09:45 -08:00
oobabooga	9824c82cb6	API: Add parallel request support for llama.cpp and ExLlamaV3	2026-03-05 16:49:58 -08:00
oobabooga	2f08dce7b0	Remove ExLlamaV2 backend - archived upstream: `7dc12af3a8` - replaced by ExLlamaV3, which has much better quantization accuracy	2026-03-05 14:02:13 -08:00
oobabooga	268cc3f100	Update TensorRT-LLM to v1.1.0	2026-03-05 09:32:28 -03:00
oobabooga	69fa4dd0b1	llama.cpp: allow ctx_size=0 for auto context via --fit	2026-03-04 19:33:20 -08:00
oobabooga	fbfcd59fe0	llama.cpp: Use -1 instead of 0 for auto gpu_layers	2026-03-04 19:21:45 -08:00
oobabooga	387cf9d8df	Remove obsolete DeepSpeed inference code (2023 relic)	2026-03-04 17:20:34 -08:00
oobabooga	cdf0e392e6	llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults	2026-03-04 12:05:08 -08:00
oobabooga	65de4c30c8	Add adaptive-p sampler and n-gram speculative decoding support	2026-03-04 09:41:29 -08:00
oobabooga	f4d787ab8d	Delegate GPU layer allocation to llama.cpp's --fit	2026-03-04 06:37:50 -08:00
q5sys (JT)	7493fe7841	feat: Add a dropdown to save/load user personas (#7367 )	2026-01-14 20:35:08 -03:00
oobabooga	e7c8b51fec	Revert "Use flash_attention_2 by default for Transformers models" This reverts commit `85f2df92e9`.	2025-12-07 18:48:41 -08:00
oobabooga	85f2df92e9	Use flash_attention_2 by default for Transformers models	2025-12-07 06:56:58 -08:00
oobabooga	11937de517	Use flash attention for image generation by default	2025-12-05 12:13:24 -08:00
oobabooga	c11c14590a	Image: Better LLM variation default prompt	2025-12-05 08:08:11 -08:00
oobabooga	8eac99599a	Image: Better LLM variation default prompt	2025-12-04 19:58:06 -08:00
oobabooga	b4f06a50b0	fix: Pass bos_token and eos_token from metadata to jinja2 Fixes loading Seed-Instruct-36B	2025-12-04 19:11:31 -08:00
oobabooga	a90739f498	Image: Better LLM variation default prompt	2025-12-04 10:50:40 -08:00
oobabooga	ffef3c7b1d	Image: Make the LLM Variations prompt configurable	2025-12-04 10:44:35 -08:00
oobabooga	2793153717	Image: Add LLM-generated prompt variations	2025-12-04 08:10:24 -08:00
oobabooga	c357eed4c7	Image: Remove the flash_attention_3 option (no idea how to get it working)	2025-12-03 18:40:34 -08:00
oobabooga	9448bf1caa	Image generation: add torchao quantization (supports torch.compile)	2025-12-02 14:22:51 -08:00
oobabooga	6291e72129	Remove quanto for now (requires messy compilation)	2025-12-02 09:57:18 -08:00
oobabooga	b3666e140d	Add image generation support (#7328 )	2025-12-02 14:55:38 -03:00
oobabooga	5327bc9397	Update modules/shared.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-28 22:48:05 -03:00
GodEmperor785	400bb0694b	Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316 )	2025-11-21 16:56:02 -03:00
oobabooga	0d4eff284c	Add a --cpu-moe model for llama.cpp	2025-11-19 05:23:43 -08:00
oobabooga	b5a6904c4a	Make --trust-remote-code immutable from the UI/API	2025-10-14 20:47:01 -07:00
oobabooga	78ff21d512	Organize the --help message	2025-10-10 15:21:08 -07:00
oobabooga	13876a1ee8	llama.cpp: Remove the --flash-attn flag (it's always on now)	2025-08-30 20:28:26 -07:00
oobabooga	0b4518e61c	"Text generation web UI" -> "Text Generation Web UI"	2025-08-27 05:53:09 -07:00
oobabooga	02ca96fa44	Multiple fixes	2025-08-25 22:17:22 -07:00
oobabooga	6c165d2e55	Fix the chat template	2025-08-25 18:28:43 -07:00
oobabooga	dbabe67e77	ExLlamaV3: Enable the --enable-tp option, add a --tp-backend option	2025-08-17 13:19:11 -07:00
altoiddealer	57f6e9af5a	Set multimodal status during Model Loading (#7199 )	2025-08-13 16:47:27 -03:00
oobabooga	d86b0ec010	Add multimodal support (llama.cpp) (#7027 )	2025-08-10 01:27:25 -03:00

1 2 3 4 5 ...

422 commits