Commit graph

222 commits

Author SHA1 Message Date
oobabooga a09f21b9de UI: Fix tool calling for GPT-OSS and Continue 2026-03-12 22:17:20 -03:00
oobabooga 9a7428b627 UI: Add collapsible accordions for tool calling steps 2026-03-12 14:16:04 -07:00
oobabooga 2d0cc7726e API: Add reasoning_content field to non-streaming chat completions
Extract thinking/reasoning blocks (e.g. <think>...</think>) into a
separate reasoning_content field on the assistant message, matching
the convention used by DeepSeek, llama.cpp, and SGLang.
2026-03-12 16:30:46 -03:00
oobabooga d45c9b3c59 API: Minor logprobs fixes 2026-03-12 16:09:49 -03:00
oobabooga a916fb0e5c API: Preserve mid-conversation system message positions 2026-03-12 14:27:24 -03:00
oobabooga fb1b3b6ddf API: Rewrite logprobs for OpenAI spec compliance across all backends
- Rewrite logprobs output format to match the OpenAI specification for
  both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
  backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
  instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
2026-03-12 14:17:32 -03:00
oobabooga 5a017aa338 API: Several OpenAI spec compliance fixes
- Return proper OpenAI error format ({"error": {...}}) instead of HTTP 500 for validation errors
- Send data: [DONE] at the end of SSE streams
- Fix finish_reason so "tool_calls" takes priority over "length"
- Stop including usage in streaming chunks when include_usage is not set
- Handle "developer" role in messages (treated same as "system")
- Add logprobs and top_logprobs parameters for chat completions
- Fix chat completions logprobs not working with llama.cpp and ExLlamav3 backends
- Add max_completion_tokens as an alias for max_tokens in chat completions
2026-03-12 13:30:38 -03:00
oobabooga 09723c9988 API: Include /v1 in the printed API URL for easier integration 2026-03-12 12:43:15 -03:00
oobabooga 2549f7c33b API: Add tool_choice support and fix tool_calls spec compliance 2026-03-12 10:29:23 -03:00
oobabooga f1cfeae372 API: Improve OpenAI spec compliance in streaming and non-streaming responses 2026-03-10 20:55:49 -07:00
oobabooga 3304b57bdf Add native logit_bias and logprobs support for ExLlamav3 2026-03-10 11:03:25 -03:00
oobabooga 8aeaa76365 Forward logit_bias, logprobs, and n to llama.cpp backend
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
2026-03-10 10:41:45 -03:00
oobabooga 39e6c997cc Refactor to not import gradio in --nowebui mode 2026-03-09 19:29:24 -07:00
oobabooga 328215b0c7 API: Stop generation on client disconnect for non-streaming requests 2026-03-07 06:06:13 -08:00
oobabooga b8b4471ab5 Security: restrict file writes to user_data_dir, block extra_flags from API 2026-03-06 16:58:11 -03:00
oobabooga d03923924a Several small fixes
- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
2026-03-06 16:52:13 -03:00
oobabooga 044566d42d API: Add tool call parsing for DeepSeek, GLM, MiniMax, and Kimi models 2026-03-06 15:06:56 -03:00
oobabooga f5acf55207 Add --chat-template-file flag to override the default instruction template for API requests
Matches llama.cpp's flag name. Supports .jinja, .jinja2, and .yaml files.
Priority: per-request params > --chat-template-file > model's built-in template.
2026-03-06 14:04:16 -03:00
oobabooga 3531069824 API: Support Llama 4 tool calling and fix tool calling edge cases 2026-03-06 13:12:14 -03:00
oobabooga f9ed8820de API: Make tool function description and parameters optional 2026-03-05 21:43:33 -08:00
oobabooga 3880c1a406 API: Accept content:null and complex tool definitions in tool calling requests 2026-03-06 02:41:38 -03:00
oobabooga d0ac58ad31 API: Fix tool_calls placement and other response compatibility issues 2026-03-05 21:25:03 -08:00
oobabooga f06583b2b9 API: Use \n instead of \r\n as the SSE separator to match OpenAI 2026-03-05 21:16:37 -08:00
oobabooga 521ddbb722 Security: restrict API model loading args to UI-exposed parameters
The /v1/internal/model/load endpoint previously allowed setting any
shared.args attribute, including security-sensitive flags like
trust_remote_code. Now only keys from list_model_elements() are accepted.
2026-03-06 01:57:02 -03:00
oobabooga 27bcc45c18 API: Add command-line flags to override default generation parameters 2026-03-06 01:36:45 -03:00
oobabooga 8d43123f73 API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.

Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}

Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga 4c406e024f API: Speed up chat completions by ~85ms per request 2026-03-05 18:36:07 -08:00
oobabooga 9824c82cb6 API: Add parallel request support for llama.cpp and ExLlamaV3 2026-03-05 16:49:58 -08:00
Sense_wang 7bf15ad933
fix: replace bare except clauses with except Exception (#7400) 2026-03-04 18:06:17 -03:00
oobabooga 65de4c30c8 Add adaptive-p sampler and n-gram speculative decoding support 2026-03-04 09:41:29 -08:00
oobabooga c026dbaf64 Fix API requests always returning the same 'created' time 2025-12-06 08:23:21 -08:00
oobabooga afa29b9554 Image: Several fixes 2025-12-05 05:58:57 -08:00
oobabooga 15c6e43597 Image: Add a revised_prompt field to API results for OpenAI compatibility 2025-12-04 17:41:09 -08:00
oobabooga 56f2a9512f Revert "Image: Add the LLM-generated prompt to the API result"
This reverts commit c7ad28a4cd.
2025-12-04 17:34:27 -08:00
oobabooga 3ef428efaa Image: Remove llm_variations from the API 2025-12-04 17:34:17 -08:00
oobabooga c7ad28a4cd Image: Add the LLM-generated prompt to the API result 2025-12-04 17:22:08 -08:00
oobabooga ffef3c7b1d Image: Make the LLM Variations prompt configurable 2025-12-04 10:44:35 -08:00
oobabooga 5763947c37 Image: Simplify the API code, add the llm_variations option 2025-12-04 10:23:00 -08:00
oobabooga 4468c49439 Add semaphore to image generation API endpoint 2025-12-03 12:02:47 -08:00
oobabooga 5433ef3333 Add an API endpoint for generating images 2025-12-03 11:50:56 -08:00
oobabooga 765af1ba17 API: Improve a validation 2025-08-11 12:39:48 -07:00
oobabooga b62c8845f3 mtmd: Fix /chat/completions for llama.cpp 2025-08-11 12:01:59 -07:00
oobabooga 6fbf162d71 Default max_tokens to 512 in the API instead of 16 2025-08-10 07:21:55 -07:00
oobabooga 1fb5807859 mtmd: Fix API text completion when no images are sent 2025-08-10 06:54:44 -07:00
oobabooga 2f90ac9880 Move the new image_utils.py file to modules/ 2025-08-09 21:41:38 -07:00
oobabooga d86b0ec010
Add multimodal support (llama.cpp) (#7027) 2025-08-10 01:27:25 -03:00
oobabooga d9db8f63a7 mtmd: Simplifications 2025-08-09 07:25:42 -07:00
Katehuuh 88127f46c1
Add multimodal support (ExLlamaV3) (#7174) 2025-08-08 23:31:16 -03:00
oobabooga 498778b8ac Add a new 'Reasoning effort' UI element 2025-08-05 15:19:11 -07:00
oobabooga 84617abdeb Properly fix the /v1/models endpoint 2025-06-19 10:25:55 -07:00