text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2026-03-26 07:14:39 +01:00

Author	SHA1	Message	Date
oobabooga	5b18be8582	Training: unify instruction training through apply_chat_template() Instead of two separate paths (format files vs Chat Template), all instruction training now uses apply_chat_template() with assistant-only label masking. Users pick a Jinja2 template from the dropdown or use the model's built-in chat template — both work identically.	2026-03-05 14:39:37 -03:00
oobabooga	d337ba0390	Training: fix apply_chat_template returning BatchEncoding instead of list	2026-03-05 13:45:28 -03:00
oobabooga	1c2548fd89	Training: use dynamic padding (pad to batch max instead of cutoff_len) - Remove pre-padding from tokenize() and tokenize_conversation() - Collate function now right-pads each batch to the longest sequence - Set tokenizer padding_side to "right" (standard for training) - Remove dead natural_keys import - Reduces wasted compute on batches with short sequences - Aligns with axolotl/unsloth approach	2026-03-05 12:45:32 -03:00
oobabooga	da2d4f1a6a	Training: replace raw text file with JSONL text dataset, re-add stride overlap - Replace "Raw text file" tab with "Text Dataset" tab using JSONL format with "text" key per row - Re-add stride overlap for chunking (configurable Stride Length slider, 0-2048 tokens) - Pad remainder chunks instead of dropping them - Remove hard_cut_string, min_chars, raw_text_file parameters - Remove .txt file and directory loading support	2026-03-05 12:33:12 -03:00
oobabooga	d278bb46a2	Add apply_chat_template() support for LoRA training - Support multi-turn conversations (OpenAI messages + ShareGPT formats) - Automatic assistant-only label masking via incremental tokenization - Use tokenizer.apply_chat_template() for proper special token handling - Add "Chat Template" option to the Data Format dropdown - Also accept instruction/output datasets (auto-converted to messages) - Validate chat template availability and dataset format upfront - Fix after_tokens[-1] IndexError when train_only_after is at end of prompt - Update docs	2026-03-05 11:47:25 -03:00
oobabooga	45188eccef	Overhaul LoRA training tab - Use peft's "all-linear" for target modules instead of the old model_to_lora_modules mapping (only knew ~39 model types) - Add "Target all linear layers" checkbox, on by default - Fix labels in tokenize() — were [1]s instead of actual token IDs - Replace DataCollatorForLanguageModeling with custom collate_fn - Raw text: concatenate-and-split instead of overlapping chunks - Adapter backup/loading: check safetensors before bin - Fix report_to=None crash on transformers 5.x - Fix no_cuda deprecation for transformers 5.x (use use_cpu) - Move torch.compile before Trainer init - Add remove_unused_columns=False (torch.compile breaks column detection) - Guard against no target modules selected - Set tracked.did_save so we don't always save twice - pad_token_id: fall back to eos_token_id instead of hardcoding 0 - Drop MODEL_CLASSES, split_chunks, cut_chunk_for_newline - Update docs	2026-03-05 10:52:59 -03:00
oobabooga	268cc3f100	Update TensorRT-LLM to v1.1.0	2026-03-05 09:32:28 -03:00
oobabooga	69fa4dd0b1	llama.cpp: allow ctx_size=0 for auto context via --fit	2026-03-04 19:33:20 -08:00
oobabooga	fbfcd59fe0	llama.cpp: Use -1 instead of 0 for auto gpu_layers	2026-03-04 19:21:45 -08:00
oobabooga	d45aa6606a	Fix blank prompt dropdown in Notebook/Default tabs on first startup	2026-03-04 19:07:55 -08:00
oobabooga	0804296f4d	Revert "UI: Remove unnecessary server round-trips from button click chains" This reverts commit `ff48956cb0`.	2026-03-04 18:41:30 -08:00
oobabooga	ff48956cb0	UI: Remove unnecessary server round-trips from button click chains	2026-03-04 18:19:56 -08:00
oobabooga	387cf9d8df	Remove obsolete DeepSpeed inference code (2023 relic)	2026-03-04 17:20:34 -08:00
oobabooga	da3010c3ed	tiny improvements to llama_cpp_server.py	2026-03-04 15:54:37 -08:00
Sense_wang	7bf15ad933	fix: replace bare except clauses with except Exception (#7400 )	2026-03-04 18:06:17 -03:00
mamei16	1d1f4dfc88	Disable uncommonly used indented codeblocks (#7401 )	2026-03-04 17:51:00 -03:00
mamei16	68109bc5da	Improve `process_markdown_content` (#7403 )	2026-03-04 17:26:13 -03:00
oobabooga	cdf0e392e6	llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults	2026-03-04 12:05:08 -08:00
oobabooga	eb90daf098	ExLlamaV2: Don't expose unused seed parameter	2026-03-04 11:14:50 -08:00
oobabooga	d8af0505a8	ExLlamav3_HF: Optimize prefill and fix CFG cache initialization	2026-03-04 11:09:58 -08:00
oobabooga	9b916f02cd	ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed	2026-03-04 10:51:15 -08:00
oobabooga	5d93f4e800	Fix requires_grad warning in logits API	2026-03-04 10:43:23 -08:00
oobabooga	64eb77e782	Fix the logits API endpoint with transformers	2026-03-04 10:41:47 -08:00
oobabooga	65de4c30c8	Add adaptive-p sampler and n-gram speculative decoding support	2026-03-04 09:41:29 -08:00
oobabooga	f010aa1612	Replace PyPDF2 with pymupdf for PDF text extraction pymupdf produces cleaner text (e.g. no concatenated words in headers), handles encrypted and malformed PDFs that PyPDF2 failed on, and supports non-Latin scripts.	2026-03-04 06:43:37 -08:00
oobabooga	f4d787ab8d	Delegate GPU layer allocation to llama.cpp's --fit	2026-03-04 06:37:50 -08:00
oobabooga	8a3d866401	Fix temperature_last having no effect in llama.cpp server sampler order	2026-03-04 06:10:51 -08:00
oobabooga	b3fd0d16e0	Use a new gr.Headless component for efficient chat streaming	2026-03-03 18:12:03 -08:00
oobabooga	2260e530c9	Remove gradio monkey-patches (moved to gradio fork)	2026-03-03 17:17:36 -08:00
oobabooga	c54e8a2b3d	Try to spawn llama.cpp on port 5001 instead of random port	2026-01-28 08:23:55 -08:00
oobabooga	dc2bbf1861	Refactor thinking block detection and add Solar Open support	2026-01-28 08:21:34 -08:00
q5sys (JT)	7493fe7841	feat: Add a dropdown to save/load user personas (#7367 )	2026-01-14 20:35:08 -03:00
Sergey 'Jin' Bostandzhyan	6e2c4e9c23	Fix loading models which have their eos token disabled (#7363 )	2026-01-06 11:31:10 -03:00
oobabooga	e7c8b51fec	Revert "Use flash_attention_2 by default for Transformers models" This reverts commit `85f2df92e9`.	2025-12-07 18:48:41 -08:00
oobabooga	b758059e95	Revert "Clear the torch cache between sequential image generations" This reverts commit `1ec9f708e5`.	2025-12-07 12:23:19 -08:00
oobabooga	1ec9f708e5	Clear the torch cache between sequential image generations	2025-12-07 11:49:22 -08:00
oobabooga	85f2df92e9	Use flash_attention_2 by default for Transformers models	2025-12-07 06:56:58 -08:00
oobabooga	1762312fb4	Use random instead of np.random for image seeds (makes it work on Windows)	2025-12-06 20:10:32 -08:00
oobabooga	02518a96a9	Lint	2025-12-06 06:55:06 -08:00
oobabooga	455dc06db0	Serve the original PNG images in the UI instead of webp	2025-12-06 05:43:00 -08:00
oobabooga	6ca99910ba	Image: Quantize the text encoder for lower VRAM	2025-12-05 13:08:46 -08:00
oobabooga	11937de517	Use flash attention for image generation by default	2025-12-05 12:13:24 -08:00
oobabooga	c11c14590a	Image: Better LLM variation default prompt	2025-12-05 08:08:11 -08:00
oobabooga	0dd468245c	Image: Add back the gallery cache (for performance)	2025-12-05 07:11:38 -08:00
oobabooga	b63d57158d	Image: Add TGW as a prefix to output images	2025-12-05 05:59:54 -08:00
oobabooga	afa29b9554	Image: Several fixes	2025-12-05 05:58:57 -08:00
oobabooga	8eac99599a	Image: Better LLM variation default prompt	2025-12-04 19:58:06 -08:00
oobabooga	b4f06a50b0	fix: Pass bos_token and eos_token from metadata to jinja2 Fixes loading Seed-Instruct-36B	2025-12-04 19:11:31 -08:00
oobabooga	56f2a9512f	Revert "Image: Add the LLM-generated prompt to the API result" This reverts commit `c7ad28a4cd`.	2025-12-04 17:34:27 -08:00
oobabooga	c7ad28a4cd	Image: Add the LLM-generated prompt to the API result	2025-12-04 17:22:08 -08:00
oobabooga	b451bac082	Image: Improve a log message	2025-12-04 16:33:46 -08:00
oobabooga	47a0fcd614	Image: PNG metadata improvements	2025-12-04 16:25:48 -08:00
oobabooga	ac31a7c008	Image: Organize the UI	2025-12-04 15:45:04 -08:00
oobabooga	a90739f498	Image: Better LLM variation default prompt	2025-12-04 10:50:40 -08:00
oobabooga	ffef3c7b1d	Image: Make the LLM Variations prompt configurable	2025-12-04 10:44:35 -08:00
oobabooga	5763947c37	Image: Simplify the API code, add the llm_variations option	2025-12-04 10:23:00 -08:00
oobabooga	2793153717	Image: Add LLM-generated prompt variations	2025-12-04 08:10:24 -08:00
oobabooga	7fb9f19bd8	Progress bar style improvements	2025-12-04 06:20:45 -08:00
oobabooga	a838223d18	Image: Add a progress bar during generation	2025-12-04 05:49:57 -08:00
oobabooga	14dbc3488e	Image: Clear the torch cache after generation, not before	2025-12-04 05:32:58 -08:00
oobabooga	c357eed4c7	Image: Remove the flash_attention_3 option (no idea how to get it working)	2025-12-03 18:40:34 -08:00
oobabooga	fbca54957e	Image generation: Yield partial results for batch count > 1	2025-12-03 16:13:07 -08:00
oobabooga	49c60882bf	Image generation: Safer image uploading	2025-12-03 16:07:51 -08:00
oobabooga	59285d501d	Image generation: Small UI improvements	2025-12-03 16:03:31 -08:00
oobabooga	373baa5c9c	UI: Minor image gallery improvements	2025-12-03 14:45:02 -08:00
oobabooga	9448bf1caa	Image generation: add torchao quantization (supports torch.compile)	2025-12-02 14:22:51 -08:00
oobabooga	97281ff831	UI: Fix an index error in the new image gallery	2025-12-02 11:20:52 -08:00
oobabooga	9d07d3a229	Make portable builds functional again after `b3666e140d`	2025-12-02 10:06:57 -08:00
oobabooga	6291e72129	Remove quanto for now (requires messy compilation)	2025-12-02 09:57:18 -08:00
oobabooga	b3666e140d	Add image generation support (#7328 )	2025-12-02 14:55:38 -03:00
oobabooga	5327bc9397	Update modules/shared.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-28 22:48:05 -03:00
GodEmperor785	400bb0694b	Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316 )	2025-11-21 16:56:02 -03:00
oobabooga	8f0048663d	More modular HTML generator	2025-11-21 07:09:16 -08:00
oobabooga	0d4eff284c	Add a --cpu-moe model for llama.cpp	2025-11-19 05:23:43 -08:00
Trenten Miller	6871484398	fix: Rename 'evaluation_strategy' to 'eval_strategy' in training	2025-10-28 16:48:04 -03:00
oobabooga	a156ebbf76	Lint	2025-10-15 13:15:01 -07:00
oobabooga	c871d9cdbd	Revert "Same as `7f06aec3a1` but for exllamav3_hf" This reverts commit `deb37b821b`.	2025-10-15 13:05:41 -07:00
oobabooga	b5a6904c4a	Make --trust-remote-code immutable from the UI/API	2025-10-14 20:47:01 -07:00
mamei16	308e726e11	log error when llama-server request exceeds context size (#7263 )	2025-10-12 23:00:11 -03:00
oobabooga	655c3e86e3	Fix "continue" missing an initial space in chat-instruct/chat modes	2025-10-11 17:00:25 -07:00
oobabooga	c7dd920dc8	Fix metadata leaking into branched chats	2025-10-11 14:12:05 -07:00
oobabooga	78ff21d512	Organize the --help message	2025-10-10 15:21:08 -07:00
oobabooga	0d03813e98	Update modules/chat.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-09 21:01:13 -03:00
oobabooga	deb37b821b	Same as `7f06aec3a1` but for exllamav3_hf	2025-10-09 13:02:38 -07:00
oobabooga	7f06aec3a1	exllamav3: Implement the logits function for /v1/internal/logits	2025-10-09 11:24:25 -07:00
oobabooga	218dc01b51	Add fallbacks after `93aa7b3ed3`	2025-10-09 10:59:34 -07:00
oobabooga	282aa19189	Safer profile picture uploading	2025-10-09 09:26:35 -07:00
oobabooga	93aa7b3ed3	Better handle multigpu setups with transformers + bitsandbytes	2025-10-09 08:49:44 -07:00
Remowylliams	38a7fd685d	chat.py fixes Instruct mode History	2025-10-05 11:34:47 -03:00
oobabooga	1e863a7113	Fix exllamav3 ignoring the stop button	2025-09-19 16:12:50 -07:00
stevenxdavis	dd6d2223a5	Changing transformers_loader.py to Match User Expectations for --bf16 and Flash Attention 2 (#7217 )	2025-09-17 16:39:04 -03:00
oobabooga	9e9ab39892	Make exllamav3_hf and exllamav2_hf functional again	2025-09-17 12:29:22 -07:00
oobabooga	f3829b268a	llama.cpp: Always pass --flash-attn on	2025-09-02 12:12:17 -07:00
oobabooga	c6ea67bbdb	Lint	2025-09-02 10:22:03 -07:00
oobabooga	00ed878b05	Slightly more robust model loading	2025-09-02 10:16:26 -07:00
oobabooga	387e249dec	Change an info message	2025-08-31 16:27:10 -07:00
oobabooga	8028d88541	Lint	2025-08-30 21:29:20 -07:00
oobabooga	13876a1ee8	llama.cpp: Remove the --flash-attn flag (it's always on now)	2025-08-30 20:28:26 -07:00
oobabooga	3a3e247f3c	Even better way to handle continue for thinking blocks	2025-08-30 12:36:35 -07:00
oobabooga	cf1aad2a68	Fix "continue" for Byte-OSS for partial thinking blocks	2025-08-30 12:16:45 -07:00
oobabooga	96136ea760	Fix LaTeX rendering for equations with asterisks	2025-08-30 10:13:32 -07:00
oobabooga	a3eb67e466	Fix the UI failing to launch if the Notebook prompt is too long	2025-08-30 08:42:26 -07:00
oobabooga	a2b37adb26	UI: Preload the correct fonts for chat mode	2025-08-29 09:25:44 -07:00
oobabooga	cb8780a4ce	Safer check for is_multimodal when loading models Avoids unrelated multimodal error when a model fails to load due to lack of memory.	2025-08-28 11:13:19 -07:00
oobabooga	cfc83745ec	UI: Improve right sidebar borders in light mode	2025-08-28 08:34:48 -07:00
oobabooga	ba6041251d	UI: Minor change	2025-08-28 06:20:00 -07:00
oobabooga	a92758a144	llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS	2025-08-27 16:15:40 -07:00
oobabooga	030ba7bfeb	UI: Mention that Seed-OSS uses enable_thinking	2025-08-27 07:44:35 -07:00
oobabooga	0b4518e61c	"Text generation web UI" -> "Text Generation Web UI"	2025-08-27 05:53:09 -07:00
oobabooga	02ca96fa44	Multiple fixes	2025-08-25 22:17:22 -07:00
oobabooga	6a7166fffa	Add support for the Seed-OSS template	2025-08-25 19:46:48 -07:00
oobabooga	8fcb4b3102	Make bot_prefix extensions functional again	2025-08-25 19:10:46 -07:00
oobabooga	8f660aefe3	Fix chat-instruct replies leaking the bot name sometimes	2025-08-25 18:50:16 -07:00
oobabooga	a531328f7e	Fix the GPT-OSS stopping string	2025-08-25 18:41:58 -07:00
oobabooga	6c165d2e55	Fix the chat template	2025-08-25 18:28:43 -07:00
oobabooga	b657be7381	Obtain stopping strings in chat mode	2025-08-25 18:22:08 -07:00
oobabooga	ded6c41cf8	Fix impersonate for chat-instruct	2025-08-25 18:16:17 -07:00
oobabooga	c1aa4590ea	Code simplifications, fix impersonate	2025-08-25 18:05:40 -07:00
oobabooga	b330ec3517	Simplifications	2025-08-25 17:54:15 -07:00
oobabooga	3ad5970374	Make the llama.cpp --verbose output less verbose	2025-08-25 17:43:21 -07:00
oobabooga	adeca8a658	Remove changes to the jinja2 templates	2025-08-25 17:36:01 -07:00
oobabooga	aad0104c1b	Remove a function	2025-08-25 17:33:13 -07:00
oobabooga	f919cdf881	chat.py code simplifications	2025-08-25 17:20:51 -07:00
oobabooga	d08800c359	chat.py improvements	2025-08-25 17:03:37 -07:00
oobabooga	3bc48014a5	chat.py code simplifications	2025-08-25 16:48:21 -07:00
oobabooga	2478294c06	UI: Preload the instruct and chat fonts	2025-08-24 12:37:41 -07:00
oobabooga	8be798e15f	llama.cpp: Fix stderr deadlock while loading some multimodal models	2025-08-24 12:20:05 -07:00
oobabooga	7fe8da8944	Minor simplification after `f247c2ae62`	2025-08-22 14:42:56 -07:00
oobabooga	f247c2ae62	Make --model work with absolute paths, eg --model /tmp/gemma-3-270m-it-IQ4_NL.gguf	2025-08-22 11:47:33 -07:00
oobabooga	9e7b326e34	Lint	2025-08-19 06:50:40 -07:00
oobabooga	1972479610	Add the TP option to exllamav3_HF	2025-08-19 06:48:22 -07:00
oobabooga	e0f5905a97	Code formatting	2025-08-19 06:34:05 -07:00
oobabooga	5b06284a8a	UI: Keep ExLlamav3_HF selected if already selected for EXL3 models	2025-08-19 06:23:21 -07:00
oobabooga	cbba58bef9	UI: Fix code blocks having an extra empty line	2025-08-18 15:50:09 -07:00
oobabooga	7d23a55901	Fix model unloading when switching loaders (closes #7203 )	2025-08-18 09:05:47 -07:00
oobabooga	64eba9576c	mtmd: Fix a bug when "include past attachments" is unchecked	2025-08-17 14:08:40 -07:00
oobabooga	dbabe67e77	ExLlamaV3: Enable the --enable-tp option, add a --tp-backend option	2025-08-17 13:19:11 -07:00
oobabooga	d771ca4a13	Fix web search (attempt)	2025-08-14 12:05:14 -07:00
altoiddealer	57f6e9af5a	Set multimodal status during Model Loading (#7199 )	2025-08-13 16:47:27 -03:00
oobabooga	41b95e9ec3	Lint	2025-08-12 13:37:37 -07:00
oobabooga	7301452b41	UI: Minor info message change	2025-08-12 13:23:24 -07:00
oobabooga	8d7b88106a	Revert "mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)" This reverts commit `d8fcc71616`.	2025-08-12 13:20:16 -07:00
oobabooga	2238302b49	ExLlamaV3: Add speculative decoding	2025-08-12 08:50:45 -07:00
oobabooga	d8fcc71616	mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)	2025-08-11 18:02:33 -07:00
oobabooga	e6447cd24a	mtmd: Update the llama-server request	2025-08-11 17:42:35 -07:00
oobabooga	0e3def449a	llama.cpp: --swa-full to llama-server when streaming-llm is checked	2025-08-11 15:17:25 -07:00
oobabooga	0e88a621fd	UI: Better organize the right sidebar	2025-08-11 15:16:03 -07:00
oobabooga	a78ca6ffcd	Remove a comment	2025-08-11 12:33:38 -07:00
oobabooga	999471256c	Lint	2025-08-11 12:32:17 -07:00
oobabooga	b62c8845f3	mtmd: Fix /chat/completions for llama.cpp	2025-08-11 12:01:59 -07:00

1 2 3 4 5 ...

2119 commits