- Rewrite logprobs output format to match the OpenAI specification for
both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
- Return proper OpenAI error format ({"error": {...}}) instead of HTTP 500 for validation errors
- Send data: [DONE] at the end of SSE streams
- Fix finish_reason so "tool_calls" takes priority over "length"
- Stop including usage in streaming chunks when include_usage is not set
- Handle "developer" role in messages (treated same as "system")
- Add logprobs and top_logprobs parameters for chat completions
- Fix chat completions logprobs not working with llama.cpp and ExLlamav3 backends
- Add max_completion_tokens as an alias for max_tokens in chat completions
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
The /v1/internal/model/load endpoint previously allowed setting any
shared.args attribute, including security-sensitive flags like
trust_remote_code. Now only keys from list_model_elements() are accepted.
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.
Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}
Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
The built-in training tab now covers its essential functionality
with a more modern and correct implementation (apply_chat_template,
dynamic padding, JSONL datasets, stride overlap).