Commit graph

5341 commits

Author SHA1 Message Date
oobabooga 0e35421593 API: Always extract reasoning_content, even with tool calls 2026-03-12 18:52:41 -07:00
oobabooga 1ed56aee85 Add a calculate tool 2026-03-12 18:45:19 -07:00
oobabooga 286ae475f6 UI: Clean up tool calling code 2026-03-12 22:39:38 -03:00
oobabooga 4c7a56c18d Add num_pages and max_tokens kwargs to web search tools 2026-03-12 22:17:23 -03:00
oobabooga a09f21b9de UI: Fix tool calling for GPT-OSS and Continue 2026-03-12 22:17:20 -03:00
oobabooga 1b7e6c5705 Add the fetch_webpage tool source 2026-03-12 17:11:05 -07:00
oobabooga f8936ec47c Truncate web_search and fetch_webpage tools to 8192 tokens 2026-03-12 17:10:41 -07:00
oobabooga 5c02b7f603 Allow the fetch_webpage tool to return links 2026-03-12 17:08:30 -07:00
oobabooga 09d5e049d6 UI: Improve the Tools checkbox list style 2026-03-12 16:53:49 -07:00
oobabooga fdd8e5b1fd Make repeated Ctrl+C force a shutdown 2026-03-12 15:48:50 -07:00
oobabooga 4f82b71ef3 UI: Bump the ctx-size max from 131072 to 262144 (256K) 2026-03-12 14:56:35 -07:00
oobabooga bbd43d9463 UI: Correctly propagate truncation_length when ctx_size is auto 2026-03-12 14:54:05 -07:00
oobabooga 3e6bd1a310 UI: Prepend thinking tag when template appends it to prompt
Makes Qwen models have a thinking block straight away during streaming.
2026-03-12 14:30:51 -07:00
oobabooga 9a7428b627 UI: Add collapsible accordions for tool calling steps 2026-03-12 14:16:04 -07:00
oobabooga 2d0cc7726e API: Add reasoning_content field to non-streaming chat completions
Extract thinking/reasoning blocks (e.g. <think>...</think>) into a
separate reasoning_content field on the assistant message, matching
the convention used by DeepSeek, llama.cpp, and SGLang.
2026-03-12 16:30:46 -03:00
oobabooga d45c9b3c59 API: Minor logprobs fixes 2026-03-12 16:09:49 -03:00
oobabooga 2466305f76 Add tool examples 2026-03-12 16:03:57 -03:00
oobabooga a916fb0e5c API: Preserve mid-conversation system message positions 2026-03-12 14:27:24 -03:00
oobabooga fb1b3b6ddf API: Rewrite logprobs for OpenAI spec compliance across all backends
- Rewrite logprobs output format to match the OpenAI specification for
  both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
  backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
  instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
2026-03-12 14:17:32 -03:00
oobabooga 5a017aa338 API: Several OpenAI spec compliance fixes
- Return proper OpenAI error format ({"error": {...}}) instead of HTTP 500 for validation errors
- Send data: [DONE] at the end of SSE streams
- Fix finish_reason so "tool_calls" takes priority over "length"
- Stop including usage in streaming chunks when include_usage is not set
- Handle "developer" role in messages (treated same as "system")
- Add logprobs and top_logprobs parameters for chat completions
- Fix chat completions logprobs not working with llama.cpp and ExLlamav3 backends
- Add max_completion_tokens as an alias for max_tokens in chat completions
2026-03-12 13:30:38 -03:00
oobabooga 4b6c9db1c9 UI: Fix stale tool_sequence after edit and chat-instruct tool rendering 2026-03-12 13:12:18 -03:00
oobabooga 09723c9988 API: Include /v1 in the printed API URL for easier integration 2026-03-12 12:43:15 -03:00
oobabooga 2549f7c33b API: Add tool_choice support and fix tool_calls spec compliance 2026-03-12 10:29:23 -03:00
oobabooga b5cac2e3b2 Fix swipes and edit for tool calling in the UI 2026-03-12 01:53:37 -03:00
oobabooga 0d62038710 Add tools refresh button and _tool_turn comment 2026-03-12 01:36:07 -03:00
oobabooga cf9ad8eafe Initial tool-calling support in the UI 2026-03-12 01:16:19 -03:00
oobabooga 980a9d1657 UI: Minor defensive changes to autosave 2026-03-11 15:50:16 -07:00
oobabooga bb00d96dc3 Use a new gr.DragDrop element for Sampler priority + update gradio 2026-03-11 19:35:12 -03:00
oobabooga 66c976e995 Update README with ROCm 7.2 torch install URL 2026-03-11 19:35:12 -03:00
oobabooga 24977846fb Update AMD ROCm from 6.4 to 7.2 2026-03-11 13:14:26 -07:00
oobabooga 7a63a56043 Update llama.cpp 2026-03-11 12:53:19 -07:00
oobabooga f1cfeae372 API: Improve OpenAI spec compliance in streaming and non-streaming responses 2026-03-10 20:55:49 -07:00
oobabooga 3304b57bdf Add native logit_bias and logprobs support for ExLlamav3 2026-03-10 11:03:25 -03:00
oobabooga 8aeaa76365 Forward logit_bias, logprobs, and n to llama.cpp backend
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
2026-03-10 10:41:45 -03:00
oobabooga 6ec4ca8b10 Add missing custom_token_bans to llama.cpp and reasoning_effort to ExLlamav3 2026-03-10 09:58:00 -03:00
oobabooga 307c085d1b Minor warning change 2026-03-09 21:44:53 -07:00
oobabooga c604ca66de Update the --multi-user warning 2026-03-09 21:36:04 -07:00
oobabooga 15792c3cb8 Update ExLlamaV3 to 0.0.24 2026-03-09 20:31:05 -07:00
oobabooga 3b71932658 Update README 2026-03-09 20:18:09 -07:00
oobabooga 83b7e47d77 Update README 2026-03-09 20:12:54 -07:00
oobabooga 7f485274eb Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation
- Use config.eos_token_id_list for all EOS tokens as stop conditions
  (fixes models like Llama-3 that define multiple EOS token IDs)
- Load vision/draft models before main model so autosplit accounts
  for their VRAM usage
- Fix loss computation in ExLlamav3_HF: use cache across chunks so
  sequences longer than 2048 tokens get correct perplexity values
2026-03-09 23:56:38 -03:00
oobabooga 39e6c997cc Refactor to not import gradio in --nowebui mode 2026-03-09 19:29:24 -07:00
oobabooga 970055ca00 Update Intel GPU support to use native PyTorch XPU wheels
PyTorch 2.9+ includes native XPU support, making
intel-extension-for-pytorch and the separate oneAPI conda
install unnecessary.

Closes #7308
2026-03-09 17:08:59 -03:00
oobabooga d6643bb4bc One-click installer: Optimize wheel downloads to only re-download changed wheels 2026-03-09 12:30:43 -07:00
oobabooga 9753b2342b Fix crash on non-UTF-8 Windows locales (e.g. Chinese GBK)
Closes #7416
2026-03-09 16:22:37 -03:00
oobabooga eb4a20137a Update README 2026-03-08 20:38:50 -07:00
oobabooga 634609acca Fix pip installing to system Miniconda on Windows, revert 0132966d 2026-03-08 20:35:41 -07:00
oobabooga 40f1837b42 README: Minor updates 2026-03-08 08:38:29 -07:00
oobabooga f6ffecfff2 Add guard against training with llama.cpp loader 2026-03-08 10:47:59 -03:00
oobabooga 5a91b8462f Remove ctx_size_draft from ExLlamav3 loader 2026-03-08 09:53:48 -03:00