oobabooga
d03923924a
Several small fixes
...
- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
2026-03-06 16:52:13 -03:00
oobabooga
f5acf55207
Add --chat-template-file flag to override the default instruction template for API requests
...
Matches llama.cpp's flag name. Supports .jinja, .jinja2, and .yaml files.
Priority: per-request params > --chat-template-file > model's built-in template.
2026-03-06 14:04:16 -03:00
oobabooga
3531069824
API: Support Llama 4 tool calling and fix tool calling edge cases
2026-03-06 13:12:14 -03:00
oobabooga
3880c1a406
API: Accept content:null and complex tool definitions in tool calling requests
2026-03-06 02:41:38 -03:00
oobabooga
d0ac58ad31
API: Fix tool_calls placement and other response compatibility issues
2026-03-05 21:25:03 -08:00
oobabooga
8d43123f73
API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
...
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.
Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}
Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga
4c406e024f
API: Speed up chat completions by ~85ms per request
2026-03-05 18:36:07 -08:00
oobabooga
9824c82cb6
API: Add parallel request support for llama.cpp and ExLlamaV3
2026-03-05 16:49:58 -08:00
oobabooga
b62c8845f3
mtmd: Fix /chat/completions for llama.cpp
2025-08-11 12:01:59 -07:00
oobabooga
1fb5807859
mtmd: Fix API text completion when no images are sent
2025-08-10 06:54:44 -07:00
oobabooga
2f90ac9880
Move the new image_utils.py file to modules/
2025-08-09 21:41:38 -07:00
oobabooga
d9db8f63a7
mtmd: Simplifications
2025-08-09 07:25:42 -07:00
Katehuuh
88127f46c1
Add multimodal support (ExLlamaV3) ( #7174 )
2025-08-08 23:31:16 -03:00
oobabooga
fd61297933
Lint
2025-05-15 21:19:19 -07:00
Jonas
fa960496d5
Tools support for OpenAI compatible API ( #6827 )
2025-05-08 12:30:27 -03:00
oobabooga
f82667f0b4
Remove more multimodal extension references
2025-05-05 14:17:00 -07:00
oobabooga
85bf2e15b9
API: Remove obsolete multimodal extension handling
...
Multimodal support will be added back once it's implemented in llama-server.
2025-05-05 14:14:48 -07:00
oobabooga
ae02ffc605
Refactor the transformers loader ( #6859 )
2025-04-20 13:33:47 -03:00
oobabooga
f01cc079b9
Lint
2025-01-29 14:00:59 -08:00
FP HAM
4bd260c60d
Give SillyTavern a bit of leaway the way the do OpenAI ( #6685 )
2025-01-22 12:01:44 -03:00
hronoas
9b3a3d8f12
openai extension fix: Handle Multiple Content Items in Messages ( #6528 )
2024-11-18 11:59:52 -03:00
Jean-Sylvain Boige
4924ee2901
typo in OpenAI response format ( #6365 )
2024-09-05 21:42:23 -03:00
Stefan Merettig
9a150c3368
API: Relax multimodal format, fixes HuggingFace Chat UI ( #6353 )
2024-09-02 23:03:15 -03:00
oobabooga
addcb52c56
Make --idle-timeout work for API requests
2024-07-28 18:31:40 -07:00
oobabooga
f27e1ba302
Add a /v1/internal/chat-prompt endpoint ( #5879 )
2024-04-19 00:24:46 -03:00
oobabooga
c37f792afa
Better way to handle user_bio default in the API (alternative to bdcf31035f)
2024-03-29 10:54:01 -07:00
oobabooga
abcdd0ad5b
API: don't use settings.yaml for default values
2024-03-10 16:15:52 -07:00
Kevin Pham
10df23efb7
Remove message.content from openai streaming API ( #5503 )
2024-02-19 18:50:27 -03:00
Forkoz
528318b700
API: Remove tiktoken from logit bias ( #5391 )
2024-01-28 21:42:03 -03:00
oobabooga
aa575119e6
API: minor fix
2024-01-22 04:38:43 -08:00
oobabooga
821dd65fb3
API: add a comment
2024-01-22 04:15:51 -08:00
oobabooga
6247eafcc5
API: better handle temperature = 0
2024-01-22 04:12:23 -08:00
oobabooga
817866c9cf
Lint
2024-01-22 04:07:25 -08:00
oobabooga
aad73667af
Lint
2024-01-22 03:25:55 -08:00
Cohee
fbf8ae39f8
API: Allow content arrays for multimodal OpenAI requests ( #5277 )
2024-01-22 08:10:26 -03:00
Ercan
166fdf09f3
API: Properly handle Images with RGBA color format ( #5332 )
2024-01-22 08:08:51 -03:00
lmg-anon
db1da9f98d
Fix logprobs tokens in OpenAI API ( #5339 )
2024-01-22 08:07:42 -03:00
oobabooga
bb2c4707c4
API: fix bug after previous commit
2024-01-09 19:08:02 -08:00
kabachuha
dbe438564e
Support for sending images into OpenAI chat API ( #4827 )
2023-12-22 22:45:53 -03:00
oobabooga
39d2fe1ed9
Jinja templates for Instruct and Chat ( #4874 )
2023-12-12 17:23:14 -03:00
oobabooga
2c5a1e67f9
Parameters: change max_new_tokens & repetition_penalty_range defaults ( #4842 )
2023-12-07 20:04:52 -03:00
Jordan Tucker
cb836dd49c
fix: use shared chat-instruct_command with api ( #4653 )
2023-11-19 01:19:10 -03:00
oobabooga
510a01ef46
Lint
2023-11-16 18:03:06 -08:00
oobabooga
e6f44d6d19
Print context length / instruction template to terminal when loading models
2023-11-15 16:00:51 -08:00
oobabooga
0777b0d3c7
Add system_message parameter, document model (unused) parameter
2023-11-10 06:47:18 -08:00
oobabooga
6e2e0317af
Separate context and system message in instruction formats ( #4499 )
2023-11-07 20:02:58 -03:00
oobabooga
3d59346871
Implement echo/suffix parameters
2023-11-07 08:43:45 -08:00
oobabooga
97c21e5667
Don't strip leading spaces in OpenAI API
2023-11-06 19:09:41 -08:00
oobabooga
28fd535f9c
Make chat API more robust
2023-11-06 05:22:01 -08:00
oobabooga
ec17a5d2b7
Make OpenAI API the default API ( #4430 )
2023-11-06 02:38:29 -03:00