text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2026-03-18 03:14:39 +01:00

Author	SHA1	Message	Date
oobabooga	a09f21b9de	UI: Fix tool calling for GPT-OSS and Continue	2026-03-12 22:17:20 -03:00
oobabooga	5c02b7f603	Allow the fetch_webpage tool to return links	2026-03-12 17:08:30 -07:00
oobabooga	09d5e049d6	UI: Improve the Tools checkbox list style	2026-03-12 16:53:49 -07:00
oobabooga	4f82b71ef3	UI: Bump the ctx-size max from 131072 to 262144 (256K)	2026-03-12 14:56:35 -07:00
oobabooga	bbd43d9463	UI: Correctly propagate truncation_length when ctx_size is auto	2026-03-12 14:54:05 -07:00
oobabooga	3e6bd1a310	UI: Prepend thinking tag when template appends it to prompt Makes Qwen models have a thinking block straight away during streaming.	2026-03-12 14:30:51 -07:00
oobabooga	9a7428b627	UI: Add collapsible accordions for tool calling steps	2026-03-12 14:16:04 -07:00
oobabooga	2d0cc7726e	API: Add reasoning_content field to non-streaming chat completions Extract thinking/reasoning blocks (e.g. <think>...</think>) into a separate reasoning_content field on the assistant message, matching the convention used by DeepSeek, llama.cpp, and SGLang.	2026-03-12 16:30:46 -03:00
oobabooga	a916fb0e5c	API: Preserve mid-conversation system message positions	2026-03-12 14:27:24 -03:00
oobabooga	fb1b3b6ddf	API: Rewrite logprobs for OpenAI spec compliance across all backends - Rewrite logprobs output format to match the OpenAI specification for both chat completions and completions endpoints - Fix top_logprobs count being ignored for llama.cpp and ExLlamav3 backends in chat completions (always returned 1 instead of requested N) - Fix non-streaming responses only returning logprobs for the last token instead of all generated tokens (affects all HF-based loaders) - Fix logprobs returning null for non-streaming chat requests on HF loaders - Fix off-by-one returning one extra top alternative on HF loaders	2026-03-12 14:17:32 -03:00
oobabooga	4b6c9db1c9	UI: Fix stale tool_sequence after edit and chat-instruct tool rendering	2026-03-12 13:12:18 -03:00
oobabooga	b5cac2e3b2	Fix swipes and edit for tool calling in the UI	2026-03-12 01:53:37 -03:00
oobabooga	0d62038710	Add tools refresh button and _tool_turn comment	2026-03-12 01:36:07 -03:00
oobabooga	cf9ad8eafe	Initial tool-calling support in the UI	2026-03-12 01:16:19 -03:00
oobabooga	980a9d1657	UI: Minor defensive changes to autosave	2026-03-11 15:50:16 -07:00
oobabooga	bb00d96dc3	Use a new gr.DragDrop element for Sampler priority + update gradio	2026-03-11 19:35:12 -03:00
oobabooga	3304b57bdf	Add native logit_bias and logprobs support for ExLlamav3	2026-03-10 11:03:25 -03:00
oobabooga	8aeaa76365	Forward logit_bias, logprobs, and n to llama.cpp backend - Forward logit_bias and logprobs natively to llama.cpp - Support n>1 completions with seed increment for diversity - Fix logprobs returning empty dict when not requested	2026-03-10 10:41:45 -03:00
oobabooga	6ec4ca8b10	Add missing custom_token_bans to llama.cpp and reasoning_effort to ExLlamav3	2026-03-10 09:58:00 -03:00
oobabooga	307c085d1b	Minor warning change	2026-03-09 21:44:53 -07:00
oobabooga	c604ca66de	Update the --multi-user warning	2026-03-09 21:36:04 -07:00
oobabooga	7f485274eb	Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation - Use config.eos_token_id_list for all EOS tokens as stop conditions (fixes models like Llama-3 that define multiple EOS token IDs) - Load vision/draft models before main model so autosplit accounts for their VRAM usage - Fix loss computation in ExLlamav3_HF: use cache across chunks so sequences longer than 2048 tokens get correct perplexity values	2026-03-09 23:56:38 -03:00
oobabooga	39e6c997cc	Refactor to not import gradio in `--nowebui` mode	2026-03-09 19:29:24 -07:00
oobabooga	40f1837b42	README: Minor updates	2026-03-08 08:38:29 -07:00
oobabooga	f6ffecfff2	Add guard against training with llama.cpp loader	2026-03-08 10:47:59 -03:00
oobabooga	5a91b8462f	Remove ctx_size_draft from ExLlamav3 loader	2026-03-08 09:53:48 -03:00
oobabooga	7a8ca9f2b0	Fix passing adaptive-p to llama-server	2026-03-08 04:09:40 -07:00
oobabooga	baf4e13ff1	ExLlamav3: fix draft cache size to match main cache	2026-03-07 22:34:48 -03:00
oobabooga	6ff111d18e	ExLlamav3: handle exceptions in ConcurrentGenerator iterate loop	2026-03-07 22:05:31 -03:00
oobabooga	304510eb3d	ExLlamav3: route all generation through ConcurrentGenerator	2026-03-07 05:54:14 -08:00
oobabooga	abc699db9b	Minor UI change	2026-03-06 19:03:38 -08:00
oobabooga	7ea5513263	Handle Qwen 3.5 thinking blocks	2026-03-06 19:01:28 -08:00
oobabooga	5fa709a3f4	llama.cpp server: use port+5 offset and suppress No parser definition detected logs	2026-03-06 18:52:34 -08:00
oobabooga	1eead661c3	Portable mode: always use ../user_data if it exists	2026-03-06 18:04:48 -08:00
oobabooga	d48b53422f	Training: Optimize _peek_json_keys to avoid loading entire file into memory	2026-03-06 15:39:08 -08:00
oobabooga	5f6754c267	Fix stop button being ignored when token throttling is off	2026-03-06 17:12:34 -03:00
oobabooga	b8b4471ab5	Security: restrict file writes to user_data_dir, block extra_flags from API	2026-03-06 16:58:11 -03:00
oobabooga	d03923924a	Several small fixes - Stop llama-server subprocess on model unload instead of relying on GC - Fix tool_calls[].index being string instead of int in API responses - Omit tool_calls key from API response when empty per OpenAI spec - Prevent division by zero when micro_batch_size > batch_size in training - Copy sampler_priority list before mutating in ExLlamaV3 - Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting - Restore original chat_template after training instead of leaving it mutated	2026-03-06 16:52:13 -03:00
oobabooga	044566d42d	API: Add tool call parsing for DeepSeek, GLM, MiniMax, and Kimi models	2026-03-06 15:06:56 -03:00
oobabooga	f5acf55207	Add --chat-template-file flag to override the default instruction template for API requests Matches llama.cpp's flag name. Supports .jinja, .jinja2, and .yaml files. Priority: per-request params > --chat-template-file > model's built-in template.	2026-03-06 14:04:16 -03:00
oobabooga	93ebfa2b7e	Fix llama-server output filter for new log format	2026-03-06 02:38:13 -03:00
oobabooga	eba262d47a	Security: prevent path traversal in character/user/file save and delete	2026-03-06 02:00:10 -03:00
oobabooga	66fb79fe15	llama.cpp: Add --fit-target param	2026-03-06 01:55:48 -03:00
oobabooga	e81a47f708	Improve the API generation defaults --help message	2026-03-05 20:41:45 -08:00
oobabooga	27bcc45c18	API: Add command-line flags to override default generation parameters	2026-03-06 01:36:45 -03:00
oobabooga	8a9afcbec6	Allow extensions to skip output post-processing	2026-03-06 01:19:46 -03:00
oobabooga	ddcad3cc51	Follow-up to `e2548f69`: add missing paths module, fix gallery extension	2026-03-06 00:58:03 -03:00
oobabooga	8d43123f73	API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models The tool call response parser only handled JSON-based formats, causing tool_calls to always be empty for models that use non-JSON formats. Add parsers for three additional tool call formats: - Qwen3.5: <tool_call><function=name><parameter=key>value</parameter> - Mistral/Devstral: functionName{"arg": "value"} - GPT-OSS: <\|channel\|>commentary to=functions.name<\|message\|>{...} Also fix multi-turn tool conversations crashing with Jinja2 UndefinedError on tool_call_id by preserving tool_calls and tool_call_id metadata through the chat history conversion.	2026-03-06 00:55:33 -03:00
oobabooga	e2548f69a9	Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data If --user-data-dir is not set, auto-detect: use ../user_data when ./user_data doesn't exist, making it easy to share user data across portable builds by placing it one folder up.	2026-03-05 19:31:10 -08:00
oobabooga	4c406e024f	API: Speed up chat completions by ~85ms per request	2026-03-05 18:36:07 -08:00

1 2 3 4 5 ...

2080 commits