oobabooga
fb1b3b6ddf
API: Rewrite logprobs for OpenAI spec compliance across all backends
...
- Rewrite logprobs output format to match the OpenAI specification for
both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
2026-03-12 14:17:32 -03:00
oobabooga
5a017aa338
API: Several OpenAI spec compliance fixes
...
- Return proper OpenAI error format ({"error": {...}}) instead of HTTP 500 for validation errors
- Send data: [DONE] at the end of SSE streams
- Fix finish_reason so "tool_calls" takes priority over "length"
- Stop including usage in streaming chunks when include_usage is not set
- Handle "developer" role in messages (treated same as "system")
- Add logprobs and top_logprobs parameters for chat completions
- Fix chat completions logprobs not working with llama.cpp and ExLlamav3 backends
- Add max_completion_tokens as an alias for max_tokens in chat completions
2026-03-12 13:30:38 -03:00
oobabooga
4b6c9db1c9
UI: Fix stale tool_sequence after edit and chat-instruct tool rendering
2026-03-12 13:12:18 -03:00
oobabooga
09723c9988
API: Include /v1 in the printed API URL for easier integration
2026-03-12 12:43:15 -03:00
oobabooga
2549f7c33b
API: Add tool_choice support and fix tool_calls spec compliance
2026-03-12 10:29:23 -03:00
oobabooga
b5cac2e3b2
Fix swipes and edit for tool calling in the UI
2026-03-12 01:53:37 -03:00
oobabooga
0d62038710
Add tools refresh button and _tool_turn comment
2026-03-12 01:36:07 -03:00
oobabooga
cf9ad8eafe
Initial tool-calling support in the UI
2026-03-12 01:16:19 -03:00
oobabooga
980a9d1657
UI: Minor defensive changes to autosave
2026-03-11 15:50:16 -07:00
oobabooga
bb00d96dc3
Use a new gr.DragDrop element for Sampler priority + update gradio
2026-03-11 19:35:12 -03:00
oobabooga
66c976e995
Update README with ROCm 7.2 torch install URL
2026-03-11 19:35:12 -03:00
oobabooga
24977846fb
Update AMD ROCm from 6.4 to 7.2
2026-03-11 13:14:26 -07:00
oobabooga
7a63a56043
Update llama.cpp
2026-03-11 12:53:19 -07:00
oobabooga
f1cfeae372
API: Improve OpenAI spec compliance in streaming and non-streaming responses
2026-03-10 20:55:49 -07:00
oobabooga
3304b57bdf
Add native logit_bias and logprobs support for ExLlamav3
2026-03-10 11:03:25 -03:00
oobabooga
8aeaa76365
Forward logit_bias, logprobs, and n to llama.cpp backend
...
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
2026-03-10 10:41:45 -03:00
oobabooga
6ec4ca8b10
Add missing custom_token_bans to llama.cpp and reasoning_effort to ExLlamav3
2026-03-10 09:58:00 -03:00
oobabooga
307c085d1b
Minor warning change
2026-03-09 21:44:53 -07:00
oobabooga
c604ca66de
Update the --multi-user warning
2026-03-09 21:36:04 -07:00
oobabooga
15792c3cb8
Update ExLlamaV3 to 0.0.24
2026-03-09 20:31:05 -07:00
oobabooga
3b71932658
Update README
2026-03-09 20:18:09 -07:00
oobabooga
83b7e47d77
Update README
2026-03-09 20:12:54 -07:00
oobabooga
7f485274eb
Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation
...
- Use config.eos_token_id_list for all EOS tokens as stop conditions
(fixes models like Llama-3 that define multiple EOS token IDs)
- Load vision/draft models before main model so autosplit accounts
for their VRAM usage
- Fix loss computation in ExLlamav3_HF: use cache across chunks so
sequences longer than 2048 tokens get correct perplexity values
2026-03-09 23:56:38 -03:00
oobabooga
39e6c997cc
Refactor to not import gradio in --nowebui mode
2026-03-09 19:29:24 -07:00
oobabooga
970055ca00
Update Intel GPU support to use native PyTorch XPU wheels
...
PyTorch 2.9+ includes native XPU support, making
intel-extension-for-pytorch and the separate oneAPI conda
install unnecessary.
Closes #7308
2026-03-09 17:08:59 -03:00
oobabooga
d6643bb4bc
One-click installer: Optimize wheel downloads to only re-download changed wheels
2026-03-09 12:30:43 -07:00
oobabooga
9753b2342b
Fix crash on non-UTF-8 Windows locales (e.g. Chinese GBK)
...
Closes #7416
2026-03-09 16:22:37 -03:00
oobabooga
eb4a20137a
Update README
2026-03-08 20:38:50 -07:00
oobabooga
634609acca
Fix pip installing to system Miniconda on Windows, revert 0132966d
2026-03-08 20:35:41 -07:00
oobabooga
40f1837b42
README: Minor updates
2026-03-08 08:38:29 -07:00
oobabooga
f6ffecfff2
Add guard against training with llama.cpp loader
2026-03-08 10:47:59 -03:00
oobabooga
5a91b8462f
Remove ctx_size_draft from ExLlamav3 loader
2026-03-08 09:53:48 -03:00
oobabooga
7a8ca9f2b0
Fix passing adaptive-p to llama-server
2026-03-08 04:09:40 -07:00
oobabooga
0132966d09
Add PyPI fallback for PyTorch install commands
2026-03-07 23:06:15 -03:00
oobabooga
baf4e13ff1
ExLlamav3: fix draft cache size to match main cache
2026-03-07 22:34:48 -03:00
oobabooga
6ff111d18e
ExLlamav3: handle exceptions in ConcurrentGenerator iterate loop
2026-03-07 22:05:31 -03:00
oobabooga
0cecc0a041
Use tar.gz for Linux/macOS portable builds to preserve symlinks
2026-03-07 06:59:48 -08:00
oobabooga
e1bf0b866f
Update the macos workflow
2026-03-07 06:46:46 -08:00
oobabooga
b686193fe2
Reapply "Update Miniforge from 25.3.0 to 26.1.0"
...
This reverts commit 085c4ef5d7 .
2026-03-07 06:10:05 -08:00
oobabooga
328215b0c7
API: Stop generation on client disconnect for non-streaming requests
2026-03-07 06:06:13 -08:00
oobabooga
304510eb3d
ExLlamav3: route all generation through ConcurrentGenerator
2026-03-07 05:54:14 -08:00
oobabooga
085c4ef5d7
Revert "Update Miniforge from 25.3.0 to 26.1.0"
...
This reverts commit 9576c5a5f4 .
2026-03-07 05:09:49 -08:00
oobabooga
aa634c77c0
Update llama.cpp
2026-03-06 21:00:36 -08:00
oobabooga
abc699db9b
Minor UI change
2026-03-06 19:03:38 -08:00
oobabooga
f2fe001cc4
Fix message copy buttons not working over HTTP
2026-03-06 19:01:38 -08:00
oobabooga
7ea5513263
Handle Qwen 3.5 thinking blocks
2026-03-06 19:01:28 -08:00
oobabooga
5fa709a3f4
llama.cpp server: use port+5 offset and suppress No parser definition detected logs
2026-03-06 18:52:34 -08:00
oobabooga
e8e0d02406
Remove outdated ROCm environment variable overrides from one_click.py
2026-03-06 18:15:05 -08:00
oobabooga
1eead661c3
Portable mode: always use ../user_data if it exists
2026-03-06 18:04:48 -08:00
oobabooga
d48b53422f
Training: Optimize _peek_json_keys to avoid loading entire file into memory
2026-03-06 15:39:08 -08:00