Commit graph

5271 commits

Author SHA1 Message Date
oobabooga b8b4471ab5 Security: restrict file writes to user_data_dir, block extra_flags from API 2026-03-06 16:58:11 -03:00
oobabooga d03923924a Several small fixes
- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
2026-03-06 16:52:13 -03:00
oobabooga 044566d42d API: Add tool call parsing for DeepSeek, GLM, MiniMax, and Kimi models 2026-03-06 15:06:56 -03:00
oobabooga f5acf55207 Add --chat-template-file flag to override the default instruction template for API requests
Matches llama.cpp's flag name. Supports .jinja, .jinja2, and .yaml files.
Priority: per-request params > --chat-template-file > model's built-in template.
2026-03-06 14:04:16 -03:00
oobabooga 3531069824 API: Support Llama 4 tool calling and fix tool calling edge cases 2026-03-06 13:12:14 -03:00
oobabooga 160f7ad6b4 Handle SIGTERM to stop llama-server on pkill 2026-03-06 12:56:33 -03:00
oobabooga 8e24a20873 Installer: Fix libstdcxx-ng version pin causing conda solver to hang on Python 3.13 2026-03-06 07:39:50 -08:00
oobabooga 3bab7fbfd4 Update Colab notebook: new default model, direct GGUF URL support 2026-03-06 06:52:49 -08:00
oobabooga e7e0df0101 Fix hover menu shifting down when chat input grows 2026-03-06 11:52:16 -03:00
oobabooga 3323dedd08 Update llama.cpp 2026-03-06 06:30:01 -08:00
oobabooga 36dbc4ccce Remove unused colorama and psutil requirements 2026-03-06 06:28:35 -08:00
oobabooga 86d59b4404 Installer: Fix edge case in wheel re-download caching 2026-03-06 06:16:57 -08:00
oobabooga 0e0e3ceb97 Update the custom gradio wheels 2026-03-06 05:46:08 -08:00
oobabooga 6d7018069c Installer: Use absolute Python path in Windows batch scripts 2026-03-05 21:56:01 -08:00
oobabooga f9ed8820de API: Make tool function description and parameters optional 2026-03-05 21:43:33 -08:00
oobabooga 3880c1a406 API: Accept content:null and complex tool definitions in tool calling requests 2026-03-06 02:41:38 -03:00
oobabooga 93ebfa2b7e Fix llama-server output filter for new log format 2026-03-06 02:38:13 -03:00
oobabooga d0ac58ad31 API: Fix tool_calls placement and other response compatibility issues 2026-03-05 21:25:03 -08:00
oobabooga f06583b2b9 API: Use \n instead of \r\n as the SSE separator to match OpenAI 2026-03-05 21:16:37 -08:00
oobabooga 8be444a559 Update the custom gradio wheels 2026-03-05 21:05:15 -08:00
oobabooga 1729fb07b9 Update llama.cpp 2026-03-05 21:04:24 -08:00
oobabooga eba262d47a Security: prevent path traversal in character/user/file save and delete 2026-03-06 02:00:10 -03:00
oobabooga 521ddbb722 Security: restrict API model loading args to UI-exposed parameters
The /v1/internal/model/load endpoint previously allowed setting any
shared.args attribute, including security-sensitive flags like
trust_remote_code. Now only keys from list_model_elements() are accepted.
2026-03-06 01:57:02 -03:00
oobabooga 66fb79fe15 llama.cpp: Add --fit-target param 2026-03-06 01:55:48 -03:00
oobabooga e81a47f708 Improve the API generation defaults --help message 2026-03-05 20:41:45 -08:00
oobabooga 27bcc45c18 API: Add command-line flags to override default generation parameters 2026-03-06 01:36:45 -03:00
oobabooga 8a9afcbec6 Allow extensions to skip output post-processing 2026-03-06 01:19:46 -03:00
oobabooga 2e7e966ef2 Docs: Better Tool/Function calling examples 2026-03-05 20:06:34 -08:00
oobabooga ddcad3cc51 Follow-up to e2548f69: add missing paths module, fix gallery extension 2026-03-06 00:58:03 -03:00
oobabooga 8d43123f73 API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.

Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}

Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga e2548f69a9 Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data
If --user-data-dir is not set, auto-detect: use ../user_data when
./user_data doesn't exist, making it easy to share user data across
portable builds by placing it one folder up.
2026-03-05 19:31:10 -08:00
oobabooga 4c406e024f API: Speed up chat completions by ~85ms per request 2026-03-05 18:36:07 -08:00
oobabooga 249bd6eea2 UI: Update the parallel info message 2026-03-05 18:11:55 -08:00
oobabooga f52d9336e5 TensorRT-LLM: Migrate from ModelRunner to LLM API, add concurrent API request support 2026-03-05 18:09:45 -08:00
oobabooga 9824c82cb6 API: Add parallel request support for llama.cpp and ExLlamaV3 2026-03-05 16:49:58 -08:00
oobabooga 2f08dce7b0 Remove ExLlamaV2 backend
- archived upstream: 7dc12af3a8
- replaced by ExLlamaV3, which has much better quantization accuracy
2026-03-05 14:02:13 -08:00
oobabooga 134ac8fc29 Update README 2026-03-05 12:30:28 -08:00
oobabooga 409db3df1e Training: Docs improvements 2026-03-05 11:30:57 -08:00
oobabooga 86d8291e58 Training: UI cleanup and better defaults 2026-03-05 11:20:55 -08:00
oobabooga 33ff3773a0 Clean up LoRA loading parameter handling 2026-03-05 16:00:13 -03:00
oobabooga 7a1fa8c9ea Training: fix checkpoint resume and surface training errors to UI 2026-03-05 15:50:39 -03:00
oobabooga 275810c843 Training: wire up HF Trainer checkpoint resumption for full state recovery 2026-03-05 15:32:49 -03:00
oobabooga 438e59498e Update ExLlamaV3 to v0.0.23 2026-03-05 10:24:31 -08:00
oobabooga 63f28cb4a2 Training: align defaults with peft/axolotl (rank 8, alpha 16, dropout 0, cutoff 512, eos on) 2026-03-05 15:12:32 -03:00
oobabooga 33a38d7ece Training: drop conversations exceeding cutoff length instead of truncating 2026-03-05 14:56:27 -03:00
oobabooga c2e494963f Training: fix silent error on model reload failure, minor cleanups 2026-03-05 14:41:44 -03:00
oobabooga 5b18be8582 Training: unify instruction training through apply_chat_template()
Instead of two separate paths (format files vs Chat Template), all
instruction training now uses apply_chat_template() with assistant-only
label masking. Users pick a Jinja2 template from the dropdown or use the
model's built-in chat template — both work identically.
2026-03-05 14:39:37 -03:00
oobabooga d337ba0390 Training: fix apply_chat_template returning BatchEncoding instead of list 2026-03-05 13:45:28 -03:00
oobabooga 5be68cc073 Remove Training_PRO extension
The built-in training tab now covers its essential functionality
with a more modern and correct implementation (apply_chat_template,
dynamic padding, JSONL datasets, stride overlap).
2026-03-05 12:55:07 -03:00
oobabooga 1ffe540c97 Full documentation update to match current codebase 2026-03-05 12:46:54 -03:00