text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2026-04-20 22:13:43 +00:00

Author	SHA1	Message	Date
oobabooga	fef95b9e56	UI: Fix an autoscroll race condition during chat streaming	2026-03-13 03:05:09 -07:00
oobabooga	5833d94d7f	UI: Prevent word breaks in tables	2026-03-13 02:56:49 -07:00
oobabooga	a4bef860b6	UI: Optimize chat streaming by batching morphdom to one update per animation frame The monitor physically cannot paint faster than its refresh rate, so intermediate morphdom calls between frames do redundant parsing, diffing, and patching work that is never displayed.	2026-03-13 06:45:47 -03:00
oobabooga	5ddc1002d2	Update ExLlamaV3 to 0.0.25	2026-03-13 02:40:17 -07:00
oobabooga	c094bc943c	UI: Skip output extensions on intermediate tool-calling turns	2026-03-12 21:45:38 -07:00
oobabooga	85ec85e569	UI: Fix Continue while in a tool-calling loop, remove the upper limit on number of tool calls	2026-03-12 20:22:35 -07:00
oobabooga	04213dff14	Address copilot feedback	2026-03-12 19:55:20 -07:00
oobabooga	24fdcc52b3	Merge branch 'main' into dev	2026-03-12 19:33:03 -07:00
oobabooga	58f26a4cc7	UI: Skip redundant work in chat loop when no tools are selected	2026-03-12 19:18:55 -07:00
oobabooga	0e35421593	API: Always extract reasoning_content, even with tool calls	2026-03-12 18:52:41 -07:00
oobabooga	1ed56aee85	Add a calculate tool	2026-03-12 18:45:19 -07:00
oobabooga	286ae475f6	UI: Clean up tool calling code	2026-03-12 22:39:38 -03:00
oobabooga	4c7a56c18d	Add num_pages and max_tokens kwargs to web search tools	2026-03-12 22:17:23 -03:00
oobabooga	a09f21b9de	UI: Fix tool calling for GPT-OSS and Continue	2026-03-12 22:17:20 -03:00
oobabooga	1b7e6c5705	Add the fetch_webpage tool source	2026-03-12 17:11:05 -07:00
oobabooga	f8936ec47c	Truncate web_search and fetch_webpage tools to 8192 tokens	2026-03-12 17:10:41 -07:00
oobabooga	5c02b7f603	Allow the fetch_webpage tool to return links	2026-03-12 17:08:30 -07:00
oobabooga	09d5e049d6	UI: Improve the Tools checkbox list style	2026-03-12 16:53:49 -07:00
oobabooga	fdd8e5b1fd	Make repeated Ctrl+C force a shutdown	2026-03-12 15:48:50 -07:00
oobabooga	4f82b71ef3	UI: Bump the ctx-size max from 131072 to 262144 (256K)	2026-03-12 14:56:35 -07:00
oobabooga	bbd43d9463	UI: Correctly propagate truncation_length when ctx_size is auto	2026-03-12 14:54:05 -07:00
oobabooga	3e6bd1a310	UI: Prepend thinking tag when template appends it to prompt Makes Qwen models have a thinking block straight away during streaming.	2026-03-12 14:30:51 -07:00
oobabooga	9a7428b627	UI: Add collapsible accordions for tool calling steps	2026-03-12 14:16:04 -07:00
oobabooga	2d0cc7726e	API: Add reasoning_content field to non-streaming chat completions Extract thinking/reasoning blocks (e.g. <think>...</think>) into a separate reasoning_content field on the assistant message, matching the convention used by DeepSeek, llama.cpp, and SGLang.	2026-03-12 16:30:46 -03:00
oobabooga	d45c9b3c59	API: Minor logprobs fixes	2026-03-12 16:09:49 -03:00
oobabooga	2466305f76	Add tool examples	2026-03-12 16:03:57 -03:00
oobabooga	a916fb0e5c	API: Preserve mid-conversation system message positions	2026-03-12 14:27:24 -03:00
oobabooga	fb1b3b6ddf	API: Rewrite logprobs for OpenAI spec compliance across all backends - Rewrite logprobs output format to match the OpenAI specification for both chat completions and completions endpoints - Fix top_logprobs count being ignored for llama.cpp and ExLlamav3 backends in chat completions (always returned 1 instead of requested N) - Fix non-streaming responses only returning logprobs for the last token instead of all generated tokens (affects all HF-based loaders) - Fix logprobs returning null for non-streaming chat requests on HF loaders - Fix off-by-one returning one extra top alternative on HF loaders	2026-03-12 14:17:32 -03:00
oobabooga	5a017aa338	API: Several OpenAI spec compliance fixes - Return proper OpenAI error format ({"error": {...}}) instead of HTTP 500 for validation errors - Send data: [DONE] at the end of SSE streams - Fix finish_reason so "tool_calls" takes priority over "length" - Stop including usage in streaming chunks when include_usage is not set - Handle "developer" role in messages (treated same as "system") - Add logprobs and top_logprobs parameters for chat completions - Fix chat completions logprobs not working with llama.cpp and ExLlamav3 backends - Add max_completion_tokens as an alias for max_tokens in chat completions	2026-03-12 13:30:38 -03:00
oobabooga	4b6c9db1c9	UI: Fix stale tool_sequence after edit and chat-instruct tool rendering	2026-03-12 13:12:18 -03:00
oobabooga	09723c9988	API: Include /v1 in the printed API URL for easier integration	2026-03-12 12:43:15 -03:00
oobabooga	2549f7c33b	API: Add tool_choice support and fix tool_calls spec compliance	2026-03-12 10:29:23 -03:00
oobabooga	b5cac2e3b2	Fix swipes and edit for tool calling in the UI	2026-03-12 01:53:37 -03:00
oobabooga	0d62038710	Add tools refresh button and _tool_turn comment	2026-03-12 01:36:07 -03:00
oobabooga	cf9ad8eafe	Initial tool-calling support in the UI	2026-03-12 01:16:19 -03:00
oobabooga	980a9d1657	UI: Minor defensive changes to autosave	2026-03-11 15:50:16 -07:00
oobabooga	bb00d96dc3	Use a new gr.DragDrop element for Sampler priority + update gradio	2026-03-11 19:35:12 -03:00
oobabooga	66c976e995	Update README with ROCm 7.2 torch install URL	2026-03-11 19:35:12 -03:00
oobabooga	24977846fb	Update AMD ROCm from 6.4 to 7.2	2026-03-11 13:14:26 -07:00
oobabooga	7a63a56043	Update llama.cpp	2026-03-11 12:53:19 -07:00
oobabooga	f1cfeae372	API: Improve OpenAI spec compliance in streaming and non-streaming responses	2026-03-10 20:55:49 -07:00
oobabooga	3304b57bdf	Add native logit_bias and logprobs support for ExLlamav3	2026-03-10 11:03:25 -03:00
oobabooga	8aeaa76365	Forward logit_bias, logprobs, and n to llama.cpp backend - Forward logit_bias and logprobs natively to llama.cpp - Support n>1 completions with seed increment for diversity - Fix logprobs returning empty dict when not requested	2026-03-10 10:41:45 -03:00
oobabooga	6ec4ca8b10	Add missing custom_token_bans to llama.cpp and reasoning_effort to ExLlamav3	2026-03-10 09:58:00 -03:00
oobabooga	307c085d1b	Minor warning change	2026-03-09 21:44:53 -07:00
oobabooga	c604ca66de	Update the --multi-user warning	2026-03-09 21:36:04 -07:00
oobabooga	15792c3cb8	Update ExLlamaV3 to 0.0.24	2026-03-09 20:31:05 -07:00
oobabooga	3b71932658	Update README	2026-03-09 20:18:09 -07:00
oobabooga	83b7e47d77	Update README	2026-03-09 20:12:54 -07:00
oobabooga	7f485274eb	Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation - Use config.eos_token_id_list for all EOS tokens as stop conditions (fixes models like Llama-3 that define multiple EOS token IDs) - Load vision/draft models before main model so autosplit accounts for their VRAM usage - Fix loss computation in ExLlamav3_HF: use cache across chunks so sequences longer than 2048 tokens get correct perplexity values	2026-03-09 23:56:38 -03:00

... 2 3 4 5 6 ...

5516 commits