oobabooga
e0a38da9f3
Improve tool call parsing for Devstral/GPT-OSS and preserve thinking across tool turns
2026-03-13 11:04:06 -03:00
oobabooga
c39c187f47
UI: Improve the style of table scrollbars
2026-03-13 03:21:47 -07:00
oobabooga
c094bc943c
UI: Skip output extensions on intermediate tool-calling turns
2026-03-12 21:45:38 -07:00
oobabooga
85ec85e569
UI: Fix Continue while in a tool-calling loop, remove the upper limit on number of tool calls
2026-03-12 20:22:35 -07:00
oobabooga
04213dff14
Address copilot feedback
2026-03-12 19:55:20 -07:00
oobabooga
58f26a4cc7
UI: Skip redundant work in chat loop when no tools are selected
2026-03-12 19:18:55 -07:00
oobabooga
286ae475f6
UI: Clean up tool calling code
2026-03-12 22:39:38 -03:00
oobabooga
a09f21b9de
UI: Fix tool calling for GPT-OSS and Continue
2026-03-12 22:17:20 -03:00
oobabooga
5c02b7f603
Allow the fetch_webpage tool to return links
2026-03-12 17:08:30 -07:00
oobabooga
09d5e049d6
UI: Improve the Tools checkbox list style
2026-03-12 16:53:49 -07:00
oobabooga
4f82b71ef3
UI: Bump the ctx-size max from 131072 to 262144 (256K)
2026-03-12 14:56:35 -07:00
oobabooga
bbd43d9463
UI: Correctly propagate truncation_length when ctx_size is auto
2026-03-12 14:54:05 -07:00
oobabooga
3e6bd1a310
UI: Prepend thinking tag when template appends it to prompt
...
Makes Qwen models have a thinking block straight away during streaming.
2026-03-12 14:30:51 -07:00
oobabooga
9a7428b627
UI: Add collapsible accordions for tool calling steps
2026-03-12 14:16:04 -07:00
oobabooga
2d0cc7726e
API: Add reasoning_content field to non-streaming chat completions
...
Extract thinking/reasoning blocks (e.g. <think>...</think>) into a
separate reasoning_content field on the assistant message, matching
the convention used by DeepSeek, llama.cpp, and SGLang.
2026-03-12 16:30:46 -03:00
oobabooga
a916fb0e5c
API: Preserve mid-conversation system message positions
2026-03-12 14:27:24 -03:00
oobabooga
fb1b3b6ddf
API: Rewrite logprobs for OpenAI spec compliance across all backends
...
- Rewrite logprobs output format to match the OpenAI specification for
both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
2026-03-12 14:17:32 -03:00
oobabooga
4b6c9db1c9
UI: Fix stale tool_sequence after edit and chat-instruct tool rendering
2026-03-12 13:12:18 -03:00
oobabooga
b5cac2e3b2
Fix swipes and edit for tool calling in the UI
2026-03-12 01:53:37 -03:00
oobabooga
0d62038710
Add tools refresh button and _tool_turn comment
2026-03-12 01:36:07 -03:00
oobabooga
cf9ad8eafe
Initial tool-calling support in the UI
2026-03-12 01:16:19 -03:00
oobabooga
980a9d1657
UI: Minor defensive changes to autosave
2026-03-11 15:50:16 -07:00
oobabooga
bb00d96dc3
Use a new gr.DragDrop element for Sampler priority + update gradio
2026-03-11 19:35:12 -03:00
oobabooga
3304b57bdf
Add native logit_bias and logprobs support for ExLlamav3
2026-03-10 11:03:25 -03:00
oobabooga
8aeaa76365
Forward logit_bias, logprobs, and n to llama.cpp backend
...
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
2026-03-10 10:41:45 -03:00
oobabooga
6ec4ca8b10
Add missing custom_token_bans to llama.cpp and reasoning_effort to ExLlamav3
2026-03-10 09:58:00 -03:00
oobabooga
307c085d1b
Minor warning change
2026-03-09 21:44:53 -07:00
oobabooga
c604ca66de
Update the --multi-user warning
2026-03-09 21:36:04 -07:00
oobabooga
7f485274eb
Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation
...
- Use config.eos_token_id_list for all EOS tokens as stop conditions
(fixes models like Llama-3 that define multiple EOS token IDs)
- Load vision/draft models before main model so autosplit accounts
for their VRAM usage
- Fix loss computation in ExLlamav3_HF: use cache across chunks so
sequences longer than 2048 tokens get correct perplexity values
2026-03-09 23:56:38 -03:00
oobabooga
39e6c997cc
Refactor to not import gradio in --nowebui mode
2026-03-09 19:29:24 -07:00
oobabooga
40f1837b42
README: Minor updates
2026-03-08 08:38:29 -07:00
oobabooga
f6ffecfff2
Add guard against training with llama.cpp loader
2026-03-08 10:47:59 -03:00
oobabooga
5a91b8462f
Remove ctx_size_draft from ExLlamav3 loader
2026-03-08 09:53:48 -03:00
oobabooga
7a8ca9f2b0
Fix passing adaptive-p to llama-server
2026-03-08 04:09:40 -07:00
oobabooga
baf4e13ff1
ExLlamav3: fix draft cache size to match main cache
2026-03-07 22:34:48 -03:00
oobabooga
6ff111d18e
ExLlamav3: handle exceptions in ConcurrentGenerator iterate loop
2026-03-07 22:05:31 -03:00
oobabooga
304510eb3d
ExLlamav3: route all generation through ConcurrentGenerator
2026-03-07 05:54:14 -08:00
oobabooga
abc699db9b
Minor UI change
2026-03-06 19:03:38 -08:00
oobabooga
7ea5513263
Handle Qwen 3.5 thinking blocks
2026-03-06 19:01:28 -08:00
oobabooga
5fa709a3f4
llama.cpp server: use port+5 offset and suppress No parser definition detected logs
2026-03-06 18:52:34 -08:00
oobabooga
1eead661c3
Portable mode: always use ../user_data if it exists
2026-03-06 18:04:48 -08:00
oobabooga
d48b53422f
Training: Optimize _peek_json_keys to avoid loading entire file into memory
2026-03-06 15:39:08 -08:00
oobabooga
5f6754c267
Fix stop button being ignored when token throttling is off
2026-03-06 17:12:34 -03:00
oobabooga
b8b4471ab5
Security: restrict file writes to user_data_dir, block extra_flags from API
2026-03-06 16:58:11 -03:00
oobabooga
d03923924a
Several small fixes
...
- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
2026-03-06 16:52:13 -03:00
oobabooga
044566d42d
API: Add tool call parsing for DeepSeek, GLM, MiniMax, and Kimi models
2026-03-06 15:06:56 -03:00
oobabooga
f5acf55207
Add --chat-template-file flag to override the default instruction template for API requests
...
Matches llama.cpp's flag name. Supports .jinja, .jinja2, and .yaml files.
Priority: per-request params > --chat-template-file > model's built-in template.
2026-03-06 14:04:16 -03:00
oobabooga
93ebfa2b7e
Fix llama-server output filter for new log format
2026-03-06 02:38:13 -03:00
oobabooga
eba262d47a
Security: prevent path traversal in character/user/file save and delete
2026-03-06 02:00:10 -03:00
oobabooga
66fb79fe15
llama.cpp: Add --fit-target param
2026-03-06 01:55:48 -03:00