Commit graph

56 commits

Author SHA1 Message Date
oobabooga
d03923924a Several small fixes
- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
2026-03-06 16:52:13 -03:00
oobabooga
f5acf55207 Add --chat-template-file flag to override the default instruction template for API requests
Matches llama.cpp's flag name. Supports .jinja, .jinja2, and .yaml files.
Priority: per-request params > --chat-template-file > model's built-in template.
2026-03-06 14:04:16 -03:00
oobabooga
3531069824 API: Support Llama 4 tool calling and fix tool calling edge cases 2026-03-06 13:12:14 -03:00
oobabooga
3880c1a406 API: Accept content:null and complex tool definitions in tool calling requests 2026-03-06 02:41:38 -03:00
oobabooga
d0ac58ad31 API: Fix tool_calls placement and other response compatibility issues 2026-03-05 21:25:03 -08:00
oobabooga
8d43123f73 API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.

Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}

Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga
4c406e024f API: Speed up chat completions by ~85ms per request 2026-03-05 18:36:07 -08:00
oobabooga
9824c82cb6 API: Add parallel request support for llama.cpp and ExLlamaV3 2026-03-05 16:49:58 -08:00
oobabooga
b62c8845f3 mtmd: Fix /chat/completions for llama.cpp 2025-08-11 12:01:59 -07:00
oobabooga
1fb5807859 mtmd: Fix API text completion when no images are sent 2025-08-10 06:54:44 -07:00
oobabooga
2f90ac9880 Move the new image_utils.py file to modules/ 2025-08-09 21:41:38 -07:00
oobabooga
d9db8f63a7 mtmd: Simplifications 2025-08-09 07:25:42 -07:00
Katehuuh
88127f46c1
Add multimodal support (ExLlamaV3) (#7174) 2025-08-08 23:31:16 -03:00
oobabooga
fd61297933 Lint 2025-05-15 21:19:19 -07:00
Jonas
fa960496d5
Tools support for OpenAI compatible API (#6827) 2025-05-08 12:30:27 -03:00
oobabooga
f82667f0b4 Remove more multimodal extension references 2025-05-05 14:17:00 -07:00
oobabooga
85bf2e15b9 API: Remove obsolete multimodal extension handling
Multimodal support will be added back once it's implemented in llama-server.
2025-05-05 14:14:48 -07:00
oobabooga
ae02ffc605
Refactor the transformers loader (#6859) 2025-04-20 13:33:47 -03:00
oobabooga
f01cc079b9 Lint 2025-01-29 14:00:59 -08:00
FP HAM
4bd260c60d
Give SillyTavern a bit of leaway the way the do OpenAI (#6685) 2025-01-22 12:01:44 -03:00
hronoas
9b3a3d8f12
openai extension fix: Handle Multiple Content Items in Messages (#6528) 2024-11-18 11:59:52 -03:00
Jean-Sylvain Boige
4924ee2901
typo in OpenAI response format (#6365) 2024-09-05 21:42:23 -03:00
Stefan Merettig
9a150c3368
API: Relax multimodal format, fixes HuggingFace Chat UI (#6353) 2024-09-02 23:03:15 -03:00
oobabooga
addcb52c56 Make --idle-timeout work for API requests 2024-07-28 18:31:40 -07:00
oobabooga
f27e1ba302
Add a /v1/internal/chat-prompt endpoint (#5879) 2024-04-19 00:24:46 -03:00
oobabooga
c37f792afa Better way to handle user_bio default in the API (alternative to bdcf31035f) 2024-03-29 10:54:01 -07:00
oobabooga
abcdd0ad5b API: don't use settings.yaml for default values 2024-03-10 16:15:52 -07:00
Kevin Pham
10df23efb7
Remove message.content from openai streaming API (#5503) 2024-02-19 18:50:27 -03:00
Forkoz
528318b700
API: Remove tiktoken from logit bias (#5391) 2024-01-28 21:42:03 -03:00
oobabooga
aa575119e6 API: minor fix 2024-01-22 04:38:43 -08:00
oobabooga
821dd65fb3 API: add a comment 2024-01-22 04:15:51 -08:00
oobabooga
6247eafcc5 API: better handle temperature = 0 2024-01-22 04:12:23 -08:00
oobabooga
817866c9cf Lint 2024-01-22 04:07:25 -08:00
oobabooga
aad73667af Lint 2024-01-22 03:25:55 -08:00
Cohee
fbf8ae39f8
API: Allow content arrays for multimodal OpenAI requests (#5277) 2024-01-22 08:10:26 -03:00
Ercan
166fdf09f3
API: Properly handle Images with RGBA color format (#5332) 2024-01-22 08:08:51 -03:00
lmg-anon
db1da9f98d
Fix logprobs tokens in OpenAI API (#5339) 2024-01-22 08:07:42 -03:00
oobabooga
bb2c4707c4 API: fix bug after previous commit 2024-01-09 19:08:02 -08:00
kabachuha
dbe438564e
Support for sending images into OpenAI chat API (#4827) 2023-12-22 22:45:53 -03:00
oobabooga
39d2fe1ed9
Jinja templates for Instruct and Chat (#4874) 2023-12-12 17:23:14 -03:00
oobabooga
2c5a1e67f9
Parameters: change max_new_tokens & repetition_penalty_range defaults (#4842) 2023-12-07 20:04:52 -03:00
Jordan Tucker
cb836dd49c
fix: use shared chat-instruct_command with api (#4653) 2023-11-19 01:19:10 -03:00
oobabooga
510a01ef46 Lint 2023-11-16 18:03:06 -08:00
oobabooga
e6f44d6d19 Print context length / instruction template to terminal when loading models 2023-11-15 16:00:51 -08:00
oobabooga
0777b0d3c7 Add system_message parameter, document model (unused) parameter 2023-11-10 06:47:18 -08:00
oobabooga
6e2e0317af
Separate context and system message in instruction formats (#4499) 2023-11-07 20:02:58 -03:00
oobabooga
3d59346871 Implement echo/suffix parameters 2023-11-07 08:43:45 -08:00
oobabooga
97c21e5667 Don't strip leading spaces in OpenAI API 2023-11-06 19:09:41 -08:00
oobabooga
28fd535f9c Make chat API more robust 2023-11-06 05:22:01 -08:00
oobabooga
ec17a5d2b7
Make OpenAI API the default API (#4430) 2023-11-06 02:38:29 -03:00