Commit graph

5516 commits

Author SHA1 Message Date
oobabooga
fef95b9e56 UI: Fix an autoscroll race condition during chat streaming 2026-03-13 03:05:09 -07:00
oobabooga
5833d94d7f UI: Prevent word breaks in tables 2026-03-13 02:56:49 -07:00
oobabooga
a4bef860b6 UI: Optimize chat streaming by batching morphdom to one update per animation frame
The monitor physically cannot paint faster than its refresh rate, so
intermediate morphdom calls between frames do redundant parsing, diffing,
and patching work that is never displayed.
2026-03-13 06:45:47 -03:00
oobabooga
5ddc1002d2 Update ExLlamaV3 to 0.0.25 2026-03-13 02:40:17 -07:00
oobabooga
c094bc943c UI: Skip output extensions on intermediate tool-calling turns 2026-03-12 21:45:38 -07:00
oobabooga
85ec85e569 UI: Fix Continue while in a tool-calling loop, remove the upper limit on number of tool calls 2026-03-12 20:22:35 -07:00
oobabooga
04213dff14 Address copilot feedback 2026-03-12 19:55:20 -07:00
oobabooga
24fdcc52b3 Merge branch 'main' into dev 2026-03-12 19:33:03 -07:00
oobabooga
58f26a4cc7 UI: Skip redundant work in chat loop when no tools are selected 2026-03-12 19:18:55 -07:00
oobabooga
0e35421593 API: Always extract reasoning_content, even with tool calls 2026-03-12 18:52:41 -07:00
oobabooga
1ed56aee85 Add a calculate tool 2026-03-12 18:45:19 -07:00
oobabooga
286ae475f6 UI: Clean up tool calling code 2026-03-12 22:39:38 -03:00
oobabooga
4c7a56c18d Add num_pages and max_tokens kwargs to web search tools 2026-03-12 22:17:23 -03:00
oobabooga
a09f21b9de UI: Fix tool calling for GPT-OSS and Continue 2026-03-12 22:17:20 -03:00
oobabooga
1b7e6c5705 Add the fetch_webpage tool source 2026-03-12 17:11:05 -07:00
oobabooga
f8936ec47c Truncate web_search and fetch_webpage tools to 8192 tokens 2026-03-12 17:10:41 -07:00
oobabooga
5c02b7f603 Allow the fetch_webpage tool to return links 2026-03-12 17:08:30 -07:00
oobabooga
09d5e049d6 UI: Improve the Tools checkbox list style 2026-03-12 16:53:49 -07:00
oobabooga
fdd8e5b1fd Make repeated Ctrl+C force a shutdown 2026-03-12 15:48:50 -07:00
oobabooga
4f82b71ef3 UI: Bump the ctx-size max from 131072 to 262144 (256K) 2026-03-12 14:56:35 -07:00
oobabooga
bbd43d9463 UI: Correctly propagate truncation_length when ctx_size is auto 2026-03-12 14:54:05 -07:00
oobabooga
3e6bd1a310 UI: Prepend thinking tag when template appends it to prompt
Makes Qwen models have a thinking block straight away during streaming.
2026-03-12 14:30:51 -07:00
oobabooga
9a7428b627 UI: Add collapsible accordions for tool calling steps 2026-03-12 14:16:04 -07:00
oobabooga
2d0cc7726e API: Add reasoning_content field to non-streaming chat completions
Extract thinking/reasoning blocks (e.g. <think>...</think>) into a
separate reasoning_content field on the assistant message, matching
the convention used by DeepSeek, llama.cpp, and SGLang.
2026-03-12 16:30:46 -03:00
oobabooga
d45c9b3c59 API: Minor logprobs fixes 2026-03-12 16:09:49 -03:00
oobabooga
2466305f76 Add tool examples 2026-03-12 16:03:57 -03:00
oobabooga
a916fb0e5c API: Preserve mid-conversation system message positions 2026-03-12 14:27:24 -03:00
oobabooga
fb1b3b6ddf API: Rewrite logprobs for OpenAI spec compliance across all backends
- Rewrite logprobs output format to match the OpenAI specification for
  both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
  backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
  instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
2026-03-12 14:17:32 -03:00
oobabooga
5a017aa338 API: Several OpenAI spec compliance fixes
- Return proper OpenAI error format ({"error": {...}}) instead of HTTP 500 for validation errors
- Send data: [DONE] at the end of SSE streams
- Fix finish_reason so "tool_calls" takes priority over "length"
- Stop including usage in streaming chunks when include_usage is not set
- Handle "developer" role in messages (treated same as "system")
- Add logprobs and top_logprobs parameters for chat completions
- Fix chat completions logprobs not working with llama.cpp and ExLlamav3 backends
- Add max_completion_tokens as an alias for max_tokens in chat completions
2026-03-12 13:30:38 -03:00
oobabooga
4b6c9db1c9 UI: Fix stale tool_sequence after edit and chat-instruct tool rendering 2026-03-12 13:12:18 -03:00
oobabooga
09723c9988 API: Include /v1 in the printed API URL for easier integration 2026-03-12 12:43:15 -03:00
oobabooga
2549f7c33b API: Add tool_choice support and fix tool_calls spec compliance 2026-03-12 10:29:23 -03:00
oobabooga
b5cac2e3b2 Fix swipes and edit for tool calling in the UI 2026-03-12 01:53:37 -03:00
oobabooga
0d62038710 Add tools refresh button and _tool_turn comment 2026-03-12 01:36:07 -03:00
oobabooga
cf9ad8eafe Initial tool-calling support in the UI 2026-03-12 01:16:19 -03:00
oobabooga
980a9d1657 UI: Minor defensive changes to autosave 2026-03-11 15:50:16 -07:00
oobabooga
bb00d96dc3 Use a new gr.DragDrop element for Sampler priority + update gradio 2026-03-11 19:35:12 -03:00
oobabooga
66c976e995 Update README with ROCm 7.2 torch install URL 2026-03-11 19:35:12 -03:00
oobabooga
24977846fb Update AMD ROCm from 6.4 to 7.2 2026-03-11 13:14:26 -07:00
oobabooga
7a63a56043 Update llama.cpp 2026-03-11 12:53:19 -07:00
oobabooga
f1cfeae372 API: Improve OpenAI spec compliance in streaming and non-streaming responses 2026-03-10 20:55:49 -07:00
oobabooga
3304b57bdf Add native logit_bias and logprobs support for ExLlamav3 2026-03-10 11:03:25 -03:00
oobabooga
8aeaa76365 Forward logit_bias, logprobs, and n to llama.cpp backend
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
2026-03-10 10:41:45 -03:00
oobabooga
6ec4ca8b10 Add missing custom_token_bans to llama.cpp and reasoning_effort to ExLlamav3 2026-03-10 09:58:00 -03:00
oobabooga
307c085d1b Minor warning change 2026-03-09 21:44:53 -07:00
oobabooga
c604ca66de Update the --multi-user warning 2026-03-09 21:36:04 -07:00
oobabooga
15792c3cb8 Update ExLlamaV3 to 0.0.24 2026-03-09 20:31:05 -07:00
oobabooga
3b71932658 Update README 2026-03-09 20:18:09 -07:00
oobabooga
83b7e47d77 Update README 2026-03-09 20:12:54 -07:00
oobabooga
7f485274eb Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation
- Use config.eos_token_id_list for all EOS tokens as stop conditions
  (fixes models like Llama-3 that define multiple EOS token IDs)
- Load vision/draft models before main model so autosplit accounts
  for their VRAM usage
- Fix loss computation in ExLlamav3_HF: use cache across chunks so
  sequences longer than 2048 tokens get correct perplexity values
2026-03-09 23:56:38 -03:00