Commit graph

5418 commits

Author SHA1 Message Date
oobabooga 737ded6959 Web search: Fix SSRF validation to block all non-global IPs 2026-03-16 05:37:46 -07:00
oobabooga 50685c93f2 Update README 2026-03-16 05:29:27 -07:00
oobabooga 9d9f5d9860 Update README 2026-03-15 20:27:44 -07:00
oobabooga 5cfe9fe295 Update README 2026-03-15 20:12:22 -07:00
oobabooga b76a289e04 API: Respect --listen-host for the OpenAI API server
Closes #7429
2026-03-15 18:04:34 -07:00
oobabooga c0de1d176c UI: Add an incognito chat option 2026-03-15 17:57:31 -07:00
oobabooga 4f80b20859 UI: Follow-up to beab346f (fix scroll deadlock on chat-parent) 2026-03-15 16:38:54 -07:00
oobabooga f8ff7cf99e Update the custom gradio wheels 2026-03-15 14:12:59 -07:00
oobabooga 92d376e420 web_search: Return all results and improve URL extraction 2026-03-15 13:14:53 -07:00
oobabooga f6a749a151 API: Fix /v1/models to only list the currently loaded model 2026-03-15 10:17:31 -07:00
oobabooga 1a2b840938 UI: Fix scroll jump when toggling thinking blocks during streaming 2026-03-15 09:52:31 -07:00
oobabooga bfea49b197 Move top_p and top_k higher up in the UI and CLI help 2026-03-15 09:34:17 -07:00
oobabooga 80d0c03bab llama.cpp: Change the default --fit-target from 1024 to 512 2026-03-15 09:29:25 -07:00
oobabooga 9119ce0680 llama.cpp: Use --fit-ctx 8192 when --fit on is used
This sets the minimum acceptable context length, which by default is 4096.
2026-03-15 09:24:14 -07:00
oobabooga 5763cab3c4 Fix a crash loading the MiniMax-M2.5 jinja template 2026-03-15 07:13:26 -07:00
oobabooga f0c16813ef Remove the rope scaling parameters
Now models have 131k+ context length. The parameters can still be
passed to llama.cpp through --extra-flags.
2026-03-14 19:43:25 -07:00
oobabooga 2d3a3794c9 Add a Top-P preset, make it the new default, clean up the built-in presets 2026-03-14 19:22:12 -07:00
oobabooga 9955e54a1f UI: Fix autoscroll not engaging when regenerating short chats 2026-03-14 18:51:12 -07:00
oobabooga d1aba08561 UI: Set chat widths to 724px 2026-03-14 18:35:44 -07:00
oobabooga c126530061 UI: Minor color change 2026-03-14 18:22:41 -07:00
oobabooga b9bdbd638e Fix after 4ae2bd86e2 2026-03-14 18:18:33 -07:00
oobabooga 9eacd4a207 UI: Minor morphdom optimizations 2026-03-14 16:07:16 -07:00
oobabooga e11425d5f8 Fix relative redirect handling in web page fetcher 2026-03-14 15:46:21 -07:00
oobabooga 4ae2bd86e2 Change the default ctx-size to 0 (auto) for llama.cpp 2026-03-14 15:30:01 -07:00
oobabooga 9f657d3976 UI: Fix a minor glitch 2026-03-14 14:19:12 -07:00
oobabooga c09a367c64 UI: Fix dark theme using light theme syntax highlighting 2026-03-14 14:08:03 -07:00
oobabooga beab346f48 UI: Fix a minor glitch 2026-03-14 12:45:37 -07:00
oobabooga 573617157a Optimize tool call detection
Avoids templates that don't contain a given necessary keyword
2026-03-14 12:09:41 -07:00
oobabooga d0a4993cf4 UI: Increase ctx-size slider maximum to 1M and step to 1024 2026-03-14 09:53:12 -07:00
oobabooga c7953fb923 Add ROCm version to portable package filenames 2026-03-14 09:44:37 -07:00
oobabooga c908ac00d7 Replace html2text with trafilatura for better web content extraction
After this change a lot of boilerplate is removed from web pages, saving tokens on agentic loops.
2026-03-14 09:29:17 -07:00
oobabooga 8bff331893 UI: Fix tool call markup flashing before accordion appears during streaming 2026-03-14 09:26:20 -07:00
oobabooga cb08ba63dc Fix GPT-OSS channel markup leaking into UI when model skips analysis block 2026-03-14 09:08:05 -07:00
oobabooga 09a6549816 API: Stream reasoning_content separately from content in OpenAI-compatible responses 2026-03-14 06:52:40 -07:00
oobabooga accb2ef661 UI/API: Prevent tool call markup from leaking into streamed UI output (closes #7427) 2026-03-14 06:26:47 -07:00
oobabooga 998b9bfb2a UI: Make all chat styles better match instruct style 2026-03-13 21:07:40 -07:00
oobabooga 5f1707af35 UI: Increase the width of non-instruct chat styles 2026-03-13 20:38:40 -07:00
oobabooga 16636c04b8 UI: Minor fix/optimization 2026-03-13 19:06:04 -07:00
oobabooga e8d1c66303 Clean up tool calling code 2026-03-13 18:27:01 -07:00
oobabooga cb88066d15 Update llama.cpp 2026-03-13 13:17:41 -07:00
oobabooga 0cd245bcbb UI: Make autoscroll more robust after the optimizations 2026-03-13 12:58:56 -07:00
oobabooga 24e7e77b55 Clean up 2026-03-13 12:37:10 -07:00
oobabooga cabb95f0d6 UI: Increase the instruct width to 768px 2026-03-13 12:24:48 -07:00
oobabooga 5362bbb413 Make web_search not download the page contents, use fetch_webpage instead 2026-03-13 12:09:08 -07:00
oobabooga d4c22ced83 UI: Optimize syntax highlighting and autoscroll by moving from MutationObserver to morphdom updates 2026-03-13 15:47:14 -03:00
oobabooga aab2596d29 UI: Fix multiple thinking blocks rendering as raw text in HTML generator 2026-03-13 15:47:11 -03:00
oobabooga e0a38da9f3 Improve tool call parsing for Devstral/GPT-OSS and preserve thinking across tool turns 2026-03-13 11:04:06 -03:00
oobabooga e50b823eee Update llama.cpp 2026-03-13 06:22:28 -07:00
oobabooga b7670cc762 Add a tool calling tutorial 2026-03-13 04:35:42 -07:00
oobabooga d0b72c73c0 Update diffusers to 0.37 2026-03-13 03:43:02 -07:00