Commit graph

5248 commits

Author SHA1 Message Date
oobabooga 66fb79fe15 llama.cpp: Add --fit-target param 2026-03-06 01:55:48 -03:00
oobabooga e81a47f708 Improve the API generation defaults --help message 2026-03-05 20:41:45 -08:00
oobabooga 27bcc45c18 API: Add command-line flags to override default generation parameters 2026-03-06 01:36:45 -03:00
oobabooga 8a9afcbec6 Allow extensions to skip output post-processing 2026-03-06 01:19:46 -03:00
oobabooga 2e7e966ef2 Docs: Better Tool/Function calling examples 2026-03-05 20:06:34 -08:00
oobabooga ddcad3cc51 Follow-up to e2548f69: add missing paths module, fix gallery extension 2026-03-06 00:58:03 -03:00
oobabooga 8d43123f73 API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.

Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}

Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga e2548f69a9 Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data
If --user-data-dir is not set, auto-detect: use ../user_data when
./user_data doesn't exist, making it easy to share user data across
portable builds by placing it one folder up.
2026-03-05 19:31:10 -08:00
oobabooga 4c406e024f API: Speed up chat completions by ~85ms per request 2026-03-05 18:36:07 -08:00
oobabooga 249bd6eea2 UI: Update the parallel info message 2026-03-05 18:11:55 -08:00
oobabooga f52d9336e5 TensorRT-LLM: Migrate from ModelRunner to LLM API, add concurrent API request support 2026-03-05 18:09:45 -08:00
oobabooga 9824c82cb6 API: Add parallel request support for llama.cpp and ExLlamaV3 2026-03-05 16:49:58 -08:00
oobabooga 2f08dce7b0 Remove ExLlamaV2 backend
- archived upstream: 7dc12af3a8
- replaced by ExLlamaV3, which has much better quantization accuracy
2026-03-05 14:02:13 -08:00
oobabooga 134ac8fc29 Update README 2026-03-05 12:30:28 -08:00
oobabooga 409db3df1e Training: Docs improvements 2026-03-05 11:30:57 -08:00
oobabooga 86d8291e58 Training: UI cleanup and better defaults 2026-03-05 11:20:55 -08:00
oobabooga 33ff3773a0 Clean up LoRA loading parameter handling 2026-03-05 16:00:13 -03:00
oobabooga 7a1fa8c9ea Training: fix checkpoint resume and surface training errors to UI 2026-03-05 15:50:39 -03:00
oobabooga 275810c843 Training: wire up HF Trainer checkpoint resumption for full state recovery 2026-03-05 15:32:49 -03:00
oobabooga 438e59498e Update ExLlamaV3 to v0.0.23 2026-03-05 10:24:31 -08:00
oobabooga 63f28cb4a2 Training: align defaults with peft/axolotl (rank 8, alpha 16, dropout 0, cutoff 512, eos on) 2026-03-05 15:12:32 -03:00
oobabooga 33a38d7ece Training: drop conversations exceeding cutoff length instead of truncating 2026-03-05 14:56:27 -03:00
oobabooga c2e494963f Training: fix silent error on model reload failure, minor cleanups 2026-03-05 14:41:44 -03:00
oobabooga 5b18be8582 Training: unify instruction training through apply_chat_template()
Instead of two separate paths (format files vs Chat Template), all
instruction training now uses apply_chat_template() with assistant-only
label masking. Users pick a Jinja2 template from the dropdown or use the
model's built-in chat template — both work identically.
2026-03-05 14:39:37 -03:00
oobabooga d337ba0390 Training: fix apply_chat_template returning BatchEncoding instead of list 2026-03-05 13:45:28 -03:00
oobabooga 5be68cc073 Remove Training_PRO extension
The built-in training tab now covers its essential functionality
with a more modern and correct implementation (apply_chat_template,
dynamic padding, JSONL datasets, stride overlap).
2026-03-05 12:55:07 -03:00
oobabooga 1ffe540c97 Full documentation update to match current codebase 2026-03-05 12:46:54 -03:00
oobabooga 1c2548fd89 Training: use dynamic padding (pad to batch max instead of cutoff_len)
- Remove pre-padding from tokenize() and tokenize_conversation()
- Collate function now right-pads each batch to the longest sequence
- Set tokenizer padding_side to "right" (standard for training)
- Remove dead natural_keys import
- Reduces wasted compute on batches with short sequences
- Aligns with axolotl/unsloth approach
2026-03-05 12:45:32 -03:00
oobabooga da2d4f1a6a Training: replace raw text file with JSONL text dataset, re-add stride overlap
- Replace "Raw text file" tab with "Text Dataset" tab using JSONL format with "text" key per row
- Re-add stride overlap for chunking (configurable Stride Length slider, 0-2048 tokens)
- Pad remainder chunks instead of dropping them
- Remove hard_cut_string, min_chars, raw_text_file parameters
- Remove .txt file and directory loading support
2026-03-05 12:33:12 -03:00
oobabooga d278bb46a2 Add apply_chat_template() support for LoRA training
- Support multi-turn conversations (OpenAI messages + ShareGPT formats)
- Automatic assistant-only label masking via incremental tokenization
- Use tokenizer.apply_chat_template() for proper special token handling
- Add "Chat Template" option to the Data Format dropdown
- Also accept instruction/output datasets (auto-converted to messages)
- Validate chat template availability and dataset format upfront
- Fix after_tokens[-1] IndexError when train_only_after is at end of prompt
- Update docs
2026-03-05 11:47:25 -03:00
oobabooga b16a1a874a Update TensorRT-LLM Dockerfile for v1.1.0 2026-03-05 06:23:56 -08:00
oobabooga 45188eccef Overhaul LoRA training tab
- Use peft's "all-linear" for target modules instead of the old
  model_to_lora_modules mapping (only knew ~39 model types)
- Add "Target all linear layers" checkbox, on by default
- Fix labels in tokenize() — were [1]s instead of actual token IDs
- Replace DataCollatorForLanguageModeling with custom collate_fn
- Raw text: concatenate-and-split instead of overlapping chunks
- Adapter backup/loading: check safetensors before bin
- Fix report_to=None crash on transformers 5.x
- Fix no_cuda deprecation for transformers 5.x (use use_cpu)
- Move torch.compile before Trainer init
- Add remove_unused_columns=False (torch.compile breaks column detection)
- Guard against no target modules selected
- Set tracked.did_save so we don't always save twice
- pad_token_id: fall back to eos_token_id instead of hardcoding 0
- Drop MODEL_CLASSES, split_chunks, cut_chunk_for_newline
- Update docs
2026-03-05 10:52:59 -03:00
oobabooga 268cc3f100 Update TensorRT-LLM to v1.1.0 2026-03-05 09:32:28 -03:00
oobabooga 69fa4dd0b1 llama.cpp: allow ctx_size=0 for auto context via --fit 2026-03-04 19:33:20 -08:00
oobabooga fbfcd59fe0 llama.cpp: Use -1 instead of 0 for auto gpu_layers 2026-03-04 19:21:45 -08:00
oobabooga d45aa6606a Fix blank prompt dropdown in Notebook/Default tabs on first startup 2026-03-04 19:07:55 -08:00
oobabooga 0804296f4d Revert "UI: Remove unnecessary server round-trips from button click chains"
This reverts commit ff48956cb0.
2026-03-04 18:41:30 -08:00
oobabooga 6a08e79fa5 Update the custom gradio wheels 2026-03-04 18:22:50 -08:00
oobabooga ff48956cb0 UI: Remove unnecessary server round-trips from button click chains 2026-03-04 18:19:56 -08:00
oobabooga 5a22970ba8 Docker: fix and clean up configs, update docs 2026-03-04 23:13:47 -03:00
oobabooga 387cf9d8df Remove obsolete DeepSpeed inference code (2023 relic) 2026-03-04 17:20:34 -08:00
oobabooga 942ff8fcb4 Remove obsolete stuff after custom gradio updates 2026-03-04 16:43:32 -08:00
oobabooga da3010c3ed tiny improvements to llama_cpp_server.py 2026-03-04 15:54:37 -08:00
oobabooga 83cc207ef7 Update the custom gradio wheels 2026-03-04 14:31:18 -08:00
thecaptain789 2ac4eb33c8
fix: correct typo 'occured' to 'occurred' (#7389) 2026-03-04 18:09:28 -03:00
Sense_wang 7bf15ad933
fix: replace bare except clauses with except Exception (#7400) 2026-03-04 18:06:17 -03:00
mamei16 1d1f4dfc88
Disable uncommonly used indented codeblocks (#7401) 2026-03-04 17:51:00 -03:00
mamei16 abb7cc02e9
Re-introduce inline LaTeX rendering with more robust exception handling (#7402) 2026-03-04 17:44:19 -03:00
mamei16 68109bc5da
Improve process_markdown_content (#7403) 2026-03-04 17:26:13 -03:00
weiguang li 952e2c404a
Bump sentence-transformers from 2.2.2 to 3.3.1 in superbooga (#7406) 2026-03-04 17:08:08 -03:00