oobabooga
c908ac00d7
Replace html2text with trafilatura for better web content extraction
...
After this change a lot of boilerplate is removed from web pages, saving tokens on agentic loops.
2026-03-14 09:29:17 -07:00
oobabooga
8bff331893
UI: Fix tool call markup flashing before accordion appears during streaming
2026-03-14 09:26:20 -07:00
oobabooga
cb08ba63dc
Fix GPT-OSS channel markup leaking into UI when model skips analysis block
2026-03-14 09:08:05 -07:00
oobabooga
09a6549816
API: Stream reasoning_content separately from content in OpenAI-compatible responses
2026-03-14 06:52:40 -07:00
oobabooga
accb2ef661
UI/API: Prevent tool call markup from leaking into streamed UI output ( closes #7427 )
2026-03-14 06:26:47 -07:00
oobabooga
998b9bfb2a
UI: Make all chat styles better match instruct style
2026-03-13 21:07:40 -07:00
oobabooga
5f1707af35
UI: Increase the width of non-instruct chat styles
2026-03-13 20:38:40 -07:00
oobabooga
16636c04b8
UI: Minor fix/optimization
2026-03-13 19:06:04 -07:00
oobabooga
e8d1c66303
Clean up tool calling code
2026-03-13 18:27:01 -07:00
oobabooga
cb88066d15
Update llama.cpp
2026-03-13 13:17:41 -07:00
oobabooga
0cd245bcbb
UI: Make autoscroll more robust after the optimizations
2026-03-13 12:58:56 -07:00
oobabooga
24e7e77b55
Clean up
2026-03-13 12:37:10 -07:00
oobabooga
cabb95f0d6
UI: Increase the instruct width to 768px
2026-03-13 12:24:48 -07:00
oobabooga
5362bbb413
Make web_search not download the page contents, use fetch_webpage instead
2026-03-13 12:09:08 -07:00
oobabooga
d4c22ced83
UI: Optimize syntax highlighting and autoscroll by moving from MutationObserver to morphdom updates
2026-03-13 15:47:14 -03:00
oobabooga
aab2596d29
UI: Fix multiple thinking blocks rendering as raw text in HTML generator
2026-03-13 15:47:11 -03:00
oobabooga
e0a38da9f3
Improve tool call parsing for Devstral/GPT-OSS and preserve thinking across tool turns
2026-03-13 11:04:06 -03:00
oobabooga
e50b823eee
Update llama.cpp
2026-03-13 06:22:28 -07:00
oobabooga
b7670cc762
Add a tool calling tutorial
2026-03-13 04:35:42 -07:00
oobabooga
d0b72c73c0
Update diffusers to 0.37
2026-03-13 03:43:02 -07:00
oobabooga
c39c187f47
UI: Improve the style of table scrollbars
2026-03-13 03:21:47 -07:00
oobabooga
4628825651
Better solution to fef95b9e56
2026-03-13 03:17:36 -07:00
oobabooga
fef95b9e56
UI: Fix an autoscroll race condition during chat streaming
2026-03-13 03:05:09 -07:00
oobabooga
5833d94d7f
UI: Prevent word breaks in tables
2026-03-13 02:56:49 -07:00
oobabooga
a4bef860b6
UI: Optimize chat streaming by batching morphdom to one update per animation frame
...
The monitor physically cannot paint faster than its refresh rate, so
intermediate morphdom calls between frames do redundant parsing, diffing,
and patching work that is never displayed.
2026-03-13 06:45:47 -03:00
oobabooga
5ddc1002d2
Update ExLlamaV3 to 0.0.25
2026-03-13 02:40:17 -07:00
oobabooga
c094bc943c
UI: Skip output extensions on intermediate tool-calling turns
2026-03-12 21:45:38 -07:00
oobabooga
85ec85e569
UI: Fix Continue while in a tool-calling loop, remove the upper limit on number of tool calls
2026-03-12 20:22:35 -07:00
oobabooga
04213dff14
Address copilot feedback
2026-03-12 19:55:20 -07:00
oobabooga
24fdcc52b3
Merge branch 'main' into dev
2026-03-12 19:33:03 -07:00
oobabooga
58f26a4cc7
UI: Skip redundant work in chat loop when no tools are selected
2026-03-12 19:18:55 -07:00
oobabooga
0e35421593
API: Always extract reasoning_content, even with tool calls
2026-03-12 18:52:41 -07:00
oobabooga
1ed56aee85
Add a calculate tool
2026-03-12 18:45:19 -07:00
oobabooga
286ae475f6
UI: Clean up tool calling code
2026-03-12 22:39:38 -03:00
oobabooga
4c7a56c18d
Add num_pages and max_tokens kwargs to web search tools
2026-03-12 22:17:23 -03:00
oobabooga
a09f21b9de
UI: Fix tool calling for GPT-OSS and Continue
2026-03-12 22:17:20 -03:00
oobabooga
1b7e6c5705
Add the fetch_webpage tool source
2026-03-12 17:11:05 -07:00
oobabooga
f8936ec47c
Truncate web_search and fetch_webpage tools to 8192 tokens
2026-03-12 17:10:41 -07:00
oobabooga
5c02b7f603
Allow the fetch_webpage tool to return links
2026-03-12 17:08:30 -07:00
oobabooga
09d5e049d6
UI: Improve the Tools checkbox list style
2026-03-12 16:53:49 -07:00
oobabooga
fdd8e5b1fd
Make repeated Ctrl+C force a shutdown
2026-03-12 15:48:50 -07:00
oobabooga
4f82b71ef3
UI: Bump the ctx-size max from 131072 to 262144 (256K)
2026-03-12 14:56:35 -07:00
oobabooga
bbd43d9463
UI: Correctly propagate truncation_length when ctx_size is auto
2026-03-12 14:54:05 -07:00
oobabooga
3e6bd1a310
UI: Prepend thinking tag when template appends it to prompt
...
Makes Qwen models have a thinking block straight away during streaming.
2026-03-12 14:30:51 -07:00
oobabooga
9a7428b627
UI: Add collapsible accordions for tool calling steps
2026-03-12 14:16:04 -07:00
oobabooga
2d0cc7726e
API: Add reasoning_content field to non-streaming chat completions
...
Extract thinking/reasoning blocks (e.g. <think>...</think>) into a
separate reasoning_content field on the assistant message, matching
the convention used by DeepSeek, llama.cpp, and SGLang.
2026-03-12 16:30:46 -03:00
oobabooga
d45c9b3c59
API: Minor logprobs fixes
2026-03-12 16:09:49 -03:00
oobabooga
2466305f76
Add tool examples
2026-03-12 16:03:57 -03:00
oobabooga
a916fb0e5c
API: Preserve mid-conversation system message positions
2026-03-12 14:27:24 -03:00
oobabooga
fb1b3b6ddf
API: Rewrite logprobs for OpenAI spec compliance across all backends
...
- Rewrite logprobs output format to match the OpenAI specification for
both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
2026-03-12 14:17:32 -03:00