Commit graph

5411 commits

Author SHA1 Message Date
oobabooga
f8ff7cf99e Update the custom gradio wheels 2026-03-15 14:12:59 -07:00
oobabooga
92d376e420 web_search: Return all results and improve URL extraction 2026-03-15 13:14:53 -07:00
oobabooga
f6a749a151 API: Fix /v1/models to only list the currently loaded model 2026-03-15 10:17:31 -07:00
oobabooga
1a2b840938 UI: Fix scroll jump when toggling thinking blocks during streaming 2026-03-15 09:52:31 -07:00
oobabooga
bfea49b197 Move top_p and top_k higher up in the UI and CLI help 2026-03-15 09:34:17 -07:00
oobabooga
80d0c03bab llama.cpp: Change the default --fit-target from 1024 to 512 2026-03-15 09:29:25 -07:00
oobabooga
9119ce0680 llama.cpp: Use --fit-ctx 8192 when --fit on is used
This sets the minimum acceptable context length, which by default is 4096.
2026-03-15 09:24:14 -07:00
oobabooga
5763cab3c4 Fix a crash loading the MiniMax-M2.5 jinja template 2026-03-15 07:13:26 -07:00
oobabooga
f0c16813ef Remove the rope scaling parameters
Now models have 131k+ context length. The parameters can still be
passed to llama.cpp through --extra-flags.
2026-03-14 19:43:25 -07:00
oobabooga
2d3a3794c9 Add a Top-P preset, make it the new default, clean up the built-in presets 2026-03-14 19:22:12 -07:00
oobabooga
9955e54a1f UI: Fix autoscroll not engaging when regenerating short chats 2026-03-14 18:51:12 -07:00
oobabooga
d1aba08561 UI: Set chat widths to 724px 2026-03-14 18:35:44 -07:00
oobabooga
c126530061 UI: Minor color change 2026-03-14 18:22:41 -07:00
oobabooga
b9bdbd638e Fix after 4ae2bd86e2 2026-03-14 18:18:33 -07:00
oobabooga
9eacd4a207 UI: Minor morphdom optimizations 2026-03-14 16:07:16 -07:00
oobabooga
e11425d5f8 Fix relative redirect handling in web page fetcher 2026-03-14 15:46:21 -07:00
oobabooga
4ae2bd86e2 Change the default ctx-size to 0 (auto) for llama.cpp 2026-03-14 15:30:01 -07:00
oobabooga
9f657d3976 UI: Fix a minor glitch 2026-03-14 14:19:12 -07:00
oobabooga
c09a367c64 UI: Fix dark theme using light theme syntax highlighting 2026-03-14 14:08:03 -07:00
oobabooga
beab346f48 UI: Fix a minor glitch 2026-03-14 12:45:37 -07:00
oobabooga
573617157a Optimize tool call detection
Avoids templates that don't contain a given necessary keyword
2026-03-14 12:09:41 -07:00
oobabooga
d0a4993cf4 UI: Increase ctx-size slider maximum to 1M and step to 1024 2026-03-14 09:53:12 -07:00
oobabooga
c7953fb923 Add ROCm version to portable package filenames 2026-03-14 09:44:37 -07:00
oobabooga
c908ac00d7 Replace html2text with trafilatura for better web content extraction
After this change a lot of boilerplate is removed from web pages, saving tokens on agentic loops.
2026-03-14 09:29:17 -07:00
oobabooga
8bff331893 UI: Fix tool call markup flashing before accordion appears during streaming 2026-03-14 09:26:20 -07:00
oobabooga
cb08ba63dc Fix GPT-OSS channel markup leaking into UI when model skips analysis block 2026-03-14 09:08:05 -07:00
oobabooga
09a6549816 API: Stream reasoning_content separately from content in OpenAI-compatible responses 2026-03-14 06:52:40 -07:00
oobabooga
accb2ef661 UI/API: Prevent tool call markup from leaking into streamed UI output (closes #7427) 2026-03-14 06:26:47 -07:00
oobabooga
998b9bfb2a UI: Make all chat styles better match instruct style 2026-03-13 21:07:40 -07:00
oobabooga
5f1707af35 UI: Increase the width of non-instruct chat styles 2026-03-13 20:38:40 -07:00
oobabooga
16636c04b8 UI: Minor fix/optimization 2026-03-13 19:06:04 -07:00
oobabooga
e8d1c66303 Clean up tool calling code 2026-03-13 18:27:01 -07:00
oobabooga
cb88066d15 Update llama.cpp 2026-03-13 13:17:41 -07:00
oobabooga
0cd245bcbb UI: Make autoscroll more robust after the optimizations 2026-03-13 12:58:56 -07:00
oobabooga
24e7e77b55 Clean up 2026-03-13 12:37:10 -07:00
oobabooga
cabb95f0d6 UI: Increase the instruct width to 768px 2026-03-13 12:24:48 -07:00
oobabooga
5362bbb413 Make web_search not download the page contents, use fetch_webpage instead 2026-03-13 12:09:08 -07:00
oobabooga
d4c22ced83 UI: Optimize syntax highlighting and autoscroll by moving from MutationObserver to morphdom updates 2026-03-13 15:47:14 -03:00
oobabooga
aab2596d29 UI: Fix multiple thinking blocks rendering as raw text in HTML generator 2026-03-13 15:47:11 -03:00
oobabooga
e0a38da9f3 Improve tool call parsing for Devstral/GPT-OSS and preserve thinking across tool turns 2026-03-13 11:04:06 -03:00
oobabooga
e50b823eee Update llama.cpp 2026-03-13 06:22:28 -07:00
oobabooga
b7670cc762 Add a tool calling tutorial 2026-03-13 04:35:42 -07:00
oobabooga
d0b72c73c0 Update diffusers to 0.37 2026-03-13 03:43:02 -07:00
oobabooga
c39c187f47 UI: Improve the style of table scrollbars 2026-03-13 03:21:47 -07:00
oobabooga
4628825651 Better solution to fef95b9e56 2026-03-13 03:17:36 -07:00
oobabooga
fef95b9e56 UI: Fix an autoscroll race condition during chat streaming 2026-03-13 03:05:09 -07:00
oobabooga
5833d94d7f UI: Prevent word breaks in tables 2026-03-13 02:56:49 -07:00
oobabooga
a4bef860b6 UI: Optimize chat streaming by batching morphdom to one update per animation frame
The monitor physically cannot paint faster than its refresh rate, so
intermediate morphdom calls between frames do redundant parsing, diffing,
and patching work that is never displayed.
2026-03-13 06:45:47 -03:00
oobabooga
5ddc1002d2 Update ExLlamaV3 to 0.0.25 2026-03-13 02:40:17 -07:00
oobabooga
c094bc943c UI: Skip output extensions on intermediate tool-calling turns 2026-03-12 21:45:38 -07:00