oobabooga
737ded6959
Web search: Fix SSRF validation to block all non-global IPs
2026-03-16 05:37:46 -07:00
oobabooga
c0de1d176c
UI: Add an incognito chat option
2026-03-15 17:57:31 -07:00
oobabooga
92d376e420
web_search: Return all results and improve URL extraction
2026-03-15 13:14:53 -07:00
oobabooga
bfea49b197
Move top_p and top_k higher up in the UI and CLI help
2026-03-15 09:34:17 -07:00
oobabooga
80d0c03bab
llama.cpp: Change the default --fit-target from 1024 to 512
2026-03-15 09:29:25 -07:00
oobabooga
9119ce0680
llama.cpp: Use --fit-ctx 8192 when --fit on is used
...
This sets the minimum acceptable context length, which by default is 4096.
2026-03-15 09:24:14 -07:00
oobabooga
5763cab3c4
Fix a crash loading the MiniMax-M2.5 jinja template
2026-03-15 07:13:26 -07:00
oobabooga
f0c16813ef
Remove the rope scaling parameters
...
Now models have 131k+ context length. The parameters can still be
passed to llama.cpp through --extra-flags.
2026-03-14 19:43:25 -07:00
oobabooga
2d3a3794c9
Add a Top-P preset, make it the new default, clean up the built-in presets
2026-03-14 19:22:12 -07:00
oobabooga
b9bdbd638e
Fix after 4ae2bd86e2
2026-03-14 18:18:33 -07:00
oobabooga
e11425d5f8
Fix relative redirect handling in web page fetcher
2026-03-14 15:46:21 -07:00
oobabooga
4ae2bd86e2
Change the default ctx-size to 0 (auto) for llama.cpp
2026-03-14 15:30:01 -07:00
oobabooga
573617157a
Optimize tool call detection
...
Avoids templates that don't contain a given necessary keyword
2026-03-14 12:09:41 -07:00
oobabooga
d0a4993cf4
UI: Increase ctx-size slider maximum to 1M and step to 1024
2026-03-14 09:53:12 -07:00
oobabooga
c908ac00d7
Replace html2text with trafilatura for better web content extraction
...
After this change a lot of boilerplate is removed from web pages, saving tokens on agentic loops.
2026-03-14 09:29:17 -07:00
oobabooga
8bff331893
UI: Fix tool call markup flashing before accordion appears during streaming
2026-03-14 09:26:20 -07:00
oobabooga
cb08ba63dc
Fix GPT-OSS channel markup leaking into UI when model skips analysis block
2026-03-14 09:08:05 -07:00
oobabooga
09a6549816
API: Stream reasoning_content separately from content in OpenAI-compatible responses
2026-03-14 06:52:40 -07:00
oobabooga
accb2ef661
UI/API: Prevent tool call markup from leaking into streamed UI output ( closes #7427 )
2026-03-14 06:26:47 -07:00
oobabooga
e8d1c66303
Clean up tool calling code
2026-03-13 18:27:01 -07:00
oobabooga
24e7e77b55
Clean up
2026-03-13 12:37:10 -07:00
oobabooga
5362bbb413
Make web_search not download the page contents, use fetch_webpage instead
2026-03-13 12:09:08 -07:00
oobabooga
aab2596d29
UI: Fix multiple thinking blocks rendering as raw text in HTML generator
2026-03-13 15:47:11 -03:00
oobabooga
e0a38da9f3
Improve tool call parsing for Devstral/GPT-OSS and preserve thinking across tool turns
2026-03-13 11:04:06 -03:00
oobabooga
c39c187f47
UI: Improve the style of table scrollbars
2026-03-13 03:21:47 -07:00
oobabooga
c094bc943c
UI: Skip output extensions on intermediate tool-calling turns
2026-03-12 21:45:38 -07:00
oobabooga
85ec85e569
UI: Fix Continue while in a tool-calling loop, remove the upper limit on number of tool calls
2026-03-12 20:22:35 -07:00
oobabooga
04213dff14
Address copilot feedback
2026-03-12 19:55:20 -07:00
oobabooga
58f26a4cc7
UI: Skip redundant work in chat loop when no tools are selected
2026-03-12 19:18:55 -07:00
oobabooga
286ae475f6
UI: Clean up tool calling code
2026-03-12 22:39:38 -03:00
oobabooga
a09f21b9de
UI: Fix tool calling for GPT-OSS and Continue
2026-03-12 22:17:20 -03:00
oobabooga
5c02b7f603
Allow the fetch_webpage tool to return links
2026-03-12 17:08:30 -07:00
oobabooga
09d5e049d6
UI: Improve the Tools checkbox list style
2026-03-12 16:53:49 -07:00
oobabooga
4f82b71ef3
UI: Bump the ctx-size max from 131072 to 262144 (256K)
2026-03-12 14:56:35 -07:00
oobabooga
bbd43d9463
UI: Correctly propagate truncation_length when ctx_size is auto
2026-03-12 14:54:05 -07:00
oobabooga
3e6bd1a310
UI: Prepend thinking tag when template appends it to prompt
...
Makes Qwen models have a thinking block straight away during streaming.
2026-03-12 14:30:51 -07:00
oobabooga
9a7428b627
UI: Add collapsible accordions for tool calling steps
2026-03-12 14:16:04 -07:00
oobabooga
2d0cc7726e
API: Add reasoning_content field to non-streaming chat completions
...
Extract thinking/reasoning blocks (e.g. <think>...</think>) into a
separate reasoning_content field on the assistant message, matching
the convention used by DeepSeek, llama.cpp, and SGLang.
2026-03-12 16:30:46 -03:00
oobabooga
a916fb0e5c
API: Preserve mid-conversation system message positions
2026-03-12 14:27:24 -03:00
oobabooga
fb1b3b6ddf
API: Rewrite logprobs for OpenAI spec compliance across all backends
...
- Rewrite logprobs output format to match the OpenAI specification for
both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
2026-03-12 14:17:32 -03:00
oobabooga
4b6c9db1c9
UI: Fix stale tool_sequence after edit and chat-instruct tool rendering
2026-03-12 13:12:18 -03:00
oobabooga
b5cac2e3b2
Fix swipes and edit for tool calling in the UI
2026-03-12 01:53:37 -03:00
oobabooga
0d62038710
Add tools refresh button and _tool_turn comment
2026-03-12 01:36:07 -03:00
oobabooga
cf9ad8eafe
Initial tool-calling support in the UI
2026-03-12 01:16:19 -03:00
oobabooga
980a9d1657
UI: Minor defensive changes to autosave
2026-03-11 15:50:16 -07:00
oobabooga
bb00d96dc3
Use a new gr.DragDrop element for Sampler priority + update gradio
2026-03-11 19:35:12 -03:00
oobabooga
3304b57bdf
Add native logit_bias and logprobs support for ExLlamav3
2026-03-10 11:03:25 -03:00
oobabooga
8aeaa76365
Forward logit_bias, logprobs, and n to llama.cpp backend
...
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
2026-03-10 10:41:45 -03:00
oobabooga
6ec4ca8b10
Add missing custom_token_bans to llama.cpp and reasoning_effort to ExLlamav3
2026-03-10 09:58:00 -03:00
oobabooga
307c085d1b
Minor warning change
2026-03-09 21:44:53 -07:00