Commit graph

2090 commits

Author SHA1 Message Date
oobabooga
93ebfa2b7e Fix llama-server output filter for new log format 2026-03-06 02:38:13 -03:00
oobabooga
eba262d47a Security: prevent path traversal in character/user/file save and delete 2026-03-06 02:00:10 -03:00
oobabooga
66fb79fe15 llama.cpp: Add --fit-target param 2026-03-06 01:55:48 -03:00
oobabooga
e81a47f708 Improve the API generation defaults --help message 2026-03-05 20:41:45 -08:00
oobabooga
27bcc45c18 API: Add command-line flags to override default generation parameters 2026-03-06 01:36:45 -03:00
oobabooga
8a9afcbec6 Allow extensions to skip output post-processing 2026-03-06 01:19:46 -03:00
oobabooga
ddcad3cc51 Follow-up to e2548f69: add missing paths module, fix gallery extension 2026-03-06 00:58:03 -03:00
oobabooga
8d43123f73 API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.

Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}

Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga
e2548f69a9 Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data
If --user-data-dir is not set, auto-detect: use ../user_data when
./user_data doesn't exist, making it easy to share user data across
portable builds by placing it one folder up.
2026-03-05 19:31:10 -08:00
oobabooga
4c406e024f API: Speed up chat completions by ~85ms per request 2026-03-05 18:36:07 -08:00
oobabooga
249bd6eea2 UI: Update the parallel info message 2026-03-05 18:11:55 -08:00
oobabooga
f52d9336e5 TensorRT-LLM: Migrate from ModelRunner to LLM API, add concurrent API request support 2026-03-05 18:09:45 -08:00
oobabooga
9824c82cb6 API: Add parallel request support for llama.cpp and ExLlamaV3 2026-03-05 16:49:58 -08:00
oobabooga
2f08dce7b0 Remove ExLlamaV2 backend
- archived upstream: 7dc12af3a8
- replaced by ExLlamaV3, which has much better quantization accuracy
2026-03-05 14:02:13 -08:00
oobabooga
86d8291e58 Training: UI cleanup and better defaults 2026-03-05 11:20:55 -08:00
oobabooga
33ff3773a0 Clean up LoRA loading parameter handling 2026-03-05 16:00:13 -03:00
oobabooga
7a1fa8c9ea Training: fix checkpoint resume and surface training errors to UI 2026-03-05 15:50:39 -03:00
oobabooga
275810c843 Training: wire up HF Trainer checkpoint resumption for full state recovery 2026-03-05 15:32:49 -03:00
oobabooga
63f28cb4a2 Training: align defaults with peft/axolotl (rank 8, alpha 16, dropout 0, cutoff 512, eos on) 2026-03-05 15:12:32 -03:00
oobabooga
33a38d7ece Training: drop conversations exceeding cutoff length instead of truncating 2026-03-05 14:56:27 -03:00
oobabooga
c2e494963f Training: fix silent error on model reload failure, minor cleanups 2026-03-05 14:41:44 -03:00
oobabooga
5b18be8582 Training: unify instruction training through apply_chat_template()
Instead of two separate paths (format files vs Chat Template), all
instruction training now uses apply_chat_template() with assistant-only
label masking. Users pick a Jinja2 template from the dropdown or use the
model's built-in chat template — both work identically.
2026-03-05 14:39:37 -03:00
oobabooga
d337ba0390 Training: fix apply_chat_template returning BatchEncoding instead of list 2026-03-05 13:45:28 -03:00
oobabooga
1c2548fd89 Training: use dynamic padding (pad to batch max instead of cutoff_len)
- Remove pre-padding from tokenize() and tokenize_conversation()
- Collate function now right-pads each batch to the longest sequence
- Set tokenizer padding_side to "right" (standard for training)
- Remove dead natural_keys import
- Reduces wasted compute on batches with short sequences
- Aligns with axolotl/unsloth approach
2026-03-05 12:45:32 -03:00
oobabooga
da2d4f1a6a Training: replace raw text file with JSONL text dataset, re-add stride overlap
- Replace "Raw text file" tab with "Text Dataset" tab using JSONL format with "text" key per row
- Re-add stride overlap for chunking (configurable Stride Length slider, 0-2048 tokens)
- Pad remainder chunks instead of dropping them
- Remove hard_cut_string, min_chars, raw_text_file parameters
- Remove .txt file and directory loading support
2026-03-05 12:33:12 -03:00
oobabooga
d278bb46a2 Add apply_chat_template() support for LoRA training
- Support multi-turn conversations (OpenAI messages + ShareGPT formats)
- Automatic assistant-only label masking via incremental tokenization
- Use tokenizer.apply_chat_template() for proper special token handling
- Add "Chat Template" option to the Data Format dropdown
- Also accept instruction/output datasets (auto-converted to messages)
- Validate chat template availability and dataset format upfront
- Fix after_tokens[-1] IndexError when train_only_after is at end of prompt
- Update docs
2026-03-05 11:47:25 -03:00
oobabooga
45188eccef Overhaul LoRA training tab
- Use peft's "all-linear" for target modules instead of the old
  model_to_lora_modules mapping (only knew ~39 model types)
- Add "Target all linear layers" checkbox, on by default
- Fix labels in tokenize() — were [1]s instead of actual token IDs
- Replace DataCollatorForLanguageModeling with custom collate_fn
- Raw text: concatenate-and-split instead of overlapping chunks
- Adapter backup/loading: check safetensors before bin
- Fix report_to=None crash on transformers 5.x
- Fix no_cuda deprecation for transformers 5.x (use use_cpu)
- Move torch.compile before Trainer init
- Add remove_unused_columns=False (torch.compile breaks column detection)
- Guard against no target modules selected
- Set tracked.did_save so we don't always save twice
- pad_token_id: fall back to eos_token_id instead of hardcoding 0
- Drop MODEL_CLASSES, split_chunks, cut_chunk_for_newline
- Update docs
2026-03-05 10:52:59 -03:00
oobabooga
268cc3f100 Update TensorRT-LLM to v1.1.0 2026-03-05 09:32:28 -03:00
oobabooga
69fa4dd0b1 llama.cpp: allow ctx_size=0 for auto context via --fit 2026-03-04 19:33:20 -08:00
oobabooga
fbfcd59fe0 llama.cpp: Use -1 instead of 0 for auto gpu_layers 2026-03-04 19:21:45 -08:00
oobabooga
d45aa6606a Fix blank prompt dropdown in Notebook/Default tabs on first startup 2026-03-04 19:07:55 -08:00
oobabooga
0804296f4d Revert "UI: Remove unnecessary server round-trips from button click chains"
This reverts commit ff48956cb0.
2026-03-04 18:41:30 -08:00
oobabooga
ff48956cb0 UI: Remove unnecessary server round-trips from button click chains 2026-03-04 18:19:56 -08:00
oobabooga
387cf9d8df Remove obsolete DeepSpeed inference code (2023 relic) 2026-03-04 17:20:34 -08:00
oobabooga
da3010c3ed tiny improvements to llama_cpp_server.py 2026-03-04 15:54:37 -08:00
Sense_wang
7bf15ad933
fix: replace bare except clauses with except Exception (#7400) 2026-03-04 18:06:17 -03:00
mamei16
1d1f4dfc88
Disable uncommonly used indented codeblocks (#7401) 2026-03-04 17:51:00 -03:00
mamei16
68109bc5da
Improve process_markdown_content (#7403) 2026-03-04 17:26:13 -03:00
oobabooga
cdf0e392e6 llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults 2026-03-04 12:05:08 -08:00
oobabooga
eb90daf098 ExLlamaV2: Don't expose unused seed parameter 2026-03-04 11:14:50 -08:00
oobabooga
d8af0505a8 ExLlamav3_HF: Optimize prefill and fix CFG cache initialization 2026-03-04 11:09:58 -08:00
oobabooga
9b916f02cd ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed 2026-03-04 10:51:15 -08:00
oobabooga
5d93f4e800 Fix requires_grad warning in logits API 2026-03-04 10:43:23 -08:00
oobabooga
64eb77e782 Fix the logits API endpoint with transformers 2026-03-04 10:41:47 -08:00
oobabooga
65de4c30c8 Add adaptive-p sampler and n-gram speculative decoding support 2026-03-04 09:41:29 -08:00
oobabooga
f010aa1612 Replace PyPDF2 with pymupdf for PDF text extraction
pymupdf produces cleaner text (e.g. no concatenated words in headers),
handles encrypted and malformed PDFs that PyPDF2 failed on, and
supports non-Latin scripts.
2026-03-04 06:43:37 -08:00
oobabooga
f4d787ab8d Delegate GPU layer allocation to llama.cpp's --fit 2026-03-04 06:37:50 -08:00
oobabooga
8a3d866401 Fix temperature_last having no effect in llama.cpp server sampler order 2026-03-04 06:10:51 -08:00
oobabooga
b3fd0d16e0 Use a new gr.Headless component for efficient chat streaming 2026-03-03 18:12:03 -08:00
oobabooga
2260e530c9 Remove gradio monkey-patches (moved to gradio fork) 2026-03-03 17:17:36 -08:00