Commit graph

2086 commits

Author SHA1 Message Date
oobabooga 27bcc45c18 API: Add command-line flags to override default generation parameters 2026-03-06 01:36:45 -03:00
oobabooga 8a9afcbec6 Allow extensions to skip output post-processing 2026-03-06 01:19:46 -03:00
oobabooga ddcad3cc51 Follow-up to e2548f69: add missing paths module, fix gallery extension 2026-03-06 00:58:03 -03:00
oobabooga 8d43123f73 API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.

Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}

Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga e2548f69a9 Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data
If --user-data-dir is not set, auto-detect: use ../user_data when
./user_data doesn't exist, making it easy to share user data across
portable builds by placing it one folder up.
2026-03-05 19:31:10 -08:00
oobabooga 4c406e024f API: Speed up chat completions by ~85ms per request 2026-03-05 18:36:07 -08:00
oobabooga 249bd6eea2 UI: Update the parallel info message 2026-03-05 18:11:55 -08:00
oobabooga f52d9336e5 TensorRT-LLM: Migrate from ModelRunner to LLM API, add concurrent API request support 2026-03-05 18:09:45 -08:00
oobabooga 9824c82cb6 API: Add parallel request support for llama.cpp and ExLlamaV3 2026-03-05 16:49:58 -08:00
oobabooga 2f08dce7b0 Remove ExLlamaV2 backend
- archived upstream: 7dc12af3a8
- replaced by ExLlamaV3, which has much better quantization accuracy
2026-03-05 14:02:13 -08:00
oobabooga 86d8291e58 Training: UI cleanup and better defaults 2026-03-05 11:20:55 -08:00
oobabooga 33ff3773a0 Clean up LoRA loading parameter handling 2026-03-05 16:00:13 -03:00
oobabooga 7a1fa8c9ea Training: fix checkpoint resume and surface training errors to UI 2026-03-05 15:50:39 -03:00
oobabooga 275810c843 Training: wire up HF Trainer checkpoint resumption for full state recovery 2026-03-05 15:32:49 -03:00
oobabooga 63f28cb4a2 Training: align defaults with peft/axolotl (rank 8, alpha 16, dropout 0, cutoff 512, eos on) 2026-03-05 15:12:32 -03:00
oobabooga 33a38d7ece Training: drop conversations exceeding cutoff length instead of truncating 2026-03-05 14:56:27 -03:00
oobabooga c2e494963f Training: fix silent error on model reload failure, minor cleanups 2026-03-05 14:41:44 -03:00
oobabooga 5b18be8582 Training: unify instruction training through apply_chat_template()
Instead of two separate paths (format files vs Chat Template), all
instruction training now uses apply_chat_template() with assistant-only
label masking. Users pick a Jinja2 template from the dropdown or use the
model's built-in chat template — both work identically.
2026-03-05 14:39:37 -03:00
oobabooga d337ba0390 Training: fix apply_chat_template returning BatchEncoding instead of list 2026-03-05 13:45:28 -03:00
oobabooga 1c2548fd89 Training: use dynamic padding (pad to batch max instead of cutoff_len)
- Remove pre-padding from tokenize() and tokenize_conversation()
- Collate function now right-pads each batch to the longest sequence
- Set tokenizer padding_side to "right" (standard for training)
- Remove dead natural_keys import
- Reduces wasted compute on batches with short sequences
- Aligns with axolotl/unsloth approach
2026-03-05 12:45:32 -03:00
oobabooga da2d4f1a6a Training: replace raw text file with JSONL text dataset, re-add stride overlap
- Replace "Raw text file" tab with "Text Dataset" tab using JSONL format with "text" key per row
- Re-add stride overlap for chunking (configurable Stride Length slider, 0-2048 tokens)
- Pad remainder chunks instead of dropping them
- Remove hard_cut_string, min_chars, raw_text_file parameters
- Remove .txt file and directory loading support
2026-03-05 12:33:12 -03:00
oobabooga d278bb46a2 Add apply_chat_template() support for LoRA training
- Support multi-turn conversations (OpenAI messages + ShareGPT formats)
- Automatic assistant-only label masking via incremental tokenization
- Use tokenizer.apply_chat_template() for proper special token handling
- Add "Chat Template" option to the Data Format dropdown
- Also accept instruction/output datasets (auto-converted to messages)
- Validate chat template availability and dataset format upfront
- Fix after_tokens[-1] IndexError when train_only_after is at end of prompt
- Update docs
2026-03-05 11:47:25 -03:00
oobabooga 45188eccef Overhaul LoRA training tab
- Use peft's "all-linear" for target modules instead of the old
  model_to_lora_modules mapping (only knew ~39 model types)
- Add "Target all linear layers" checkbox, on by default
- Fix labels in tokenize() — were [1]s instead of actual token IDs
- Replace DataCollatorForLanguageModeling with custom collate_fn
- Raw text: concatenate-and-split instead of overlapping chunks
- Adapter backup/loading: check safetensors before bin
- Fix report_to=None crash on transformers 5.x
- Fix no_cuda deprecation for transformers 5.x (use use_cpu)
- Move torch.compile before Trainer init
- Add remove_unused_columns=False (torch.compile breaks column detection)
- Guard against no target modules selected
- Set tracked.did_save so we don't always save twice
- pad_token_id: fall back to eos_token_id instead of hardcoding 0
- Drop MODEL_CLASSES, split_chunks, cut_chunk_for_newline
- Update docs
2026-03-05 10:52:59 -03:00
oobabooga 268cc3f100 Update TensorRT-LLM to v1.1.0 2026-03-05 09:32:28 -03:00
oobabooga 69fa4dd0b1 llama.cpp: allow ctx_size=0 for auto context via --fit 2026-03-04 19:33:20 -08:00
oobabooga fbfcd59fe0 llama.cpp: Use -1 instead of 0 for auto gpu_layers 2026-03-04 19:21:45 -08:00
oobabooga d45aa6606a Fix blank prompt dropdown in Notebook/Default tabs on first startup 2026-03-04 19:07:55 -08:00
oobabooga 0804296f4d Revert "UI: Remove unnecessary server round-trips from button click chains"
This reverts commit ff48956cb0.
2026-03-04 18:41:30 -08:00
oobabooga ff48956cb0 UI: Remove unnecessary server round-trips from button click chains 2026-03-04 18:19:56 -08:00
oobabooga 387cf9d8df Remove obsolete DeepSpeed inference code (2023 relic) 2026-03-04 17:20:34 -08:00
oobabooga da3010c3ed tiny improvements to llama_cpp_server.py 2026-03-04 15:54:37 -08:00
Sense_wang 7bf15ad933
fix: replace bare except clauses with except Exception (#7400) 2026-03-04 18:06:17 -03:00
mamei16 1d1f4dfc88
Disable uncommonly used indented codeblocks (#7401) 2026-03-04 17:51:00 -03:00
mamei16 68109bc5da
Improve process_markdown_content (#7403) 2026-03-04 17:26:13 -03:00
oobabooga cdf0e392e6 llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults 2026-03-04 12:05:08 -08:00
oobabooga eb90daf098 ExLlamaV2: Don't expose unused seed parameter 2026-03-04 11:14:50 -08:00
oobabooga d8af0505a8 ExLlamav3_HF: Optimize prefill and fix CFG cache initialization 2026-03-04 11:09:58 -08:00
oobabooga 9b916f02cd ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed 2026-03-04 10:51:15 -08:00
oobabooga 5d93f4e800 Fix requires_grad warning in logits API 2026-03-04 10:43:23 -08:00
oobabooga 64eb77e782 Fix the logits API endpoint with transformers 2026-03-04 10:41:47 -08:00
oobabooga 65de4c30c8 Add adaptive-p sampler and n-gram speculative decoding support 2026-03-04 09:41:29 -08:00
oobabooga f010aa1612 Replace PyPDF2 with pymupdf for PDF text extraction
pymupdf produces cleaner text (e.g. no concatenated words in headers),
handles encrypted and malformed PDFs that PyPDF2 failed on, and
supports non-Latin scripts.
2026-03-04 06:43:37 -08:00
oobabooga f4d787ab8d Delegate GPU layer allocation to llama.cpp's --fit 2026-03-04 06:37:50 -08:00
oobabooga 8a3d866401 Fix temperature_last having no effect in llama.cpp server sampler order 2026-03-04 06:10:51 -08:00
oobabooga b3fd0d16e0 Use a new gr.Headless component for efficient chat streaming 2026-03-03 18:12:03 -08:00
oobabooga 2260e530c9 Remove gradio monkey-patches (moved to gradio fork) 2026-03-03 17:17:36 -08:00
oobabooga c54e8a2b3d Try to spawn llama.cpp on port 5001 instead of random port 2026-01-28 08:23:55 -08:00
oobabooga dc2bbf1861 Refactor thinking block detection and add Solar Open support 2026-01-28 08:21:34 -08:00
q5sys (JT) 7493fe7841
feat: Add a dropdown to save/load user personas (#7367) 2026-01-14 20:35:08 -03:00
Sergey 'Jin' Bostandzhyan 6e2c4e9c23
Fix loading models which have their eos token disabled (#7363) 2026-01-06 11:31:10 -03:00
oobabooga e7c8b51fec Revert "Use flash_attention_2 by default for Transformers models"
This reverts commit 85f2df92e9.
2025-12-07 18:48:41 -08:00
oobabooga b758059e95 Revert "Clear the torch cache between sequential image generations"
This reverts commit 1ec9f708e5.
2025-12-07 12:23:19 -08:00
oobabooga 1ec9f708e5 Clear the torch cache between sequential image generations 2025-12-07 11:49:22 -08:00
oobabooga 85f2df92e9 Use flash_attention_2 by default for Transformers models 2025-12-07 06:56:58 -08:00
oobabooga 1762312fb4 Use random instead of np.random for image seeds (makes it work on Windows) 2025-12-06 20:10:32 -08:00
oobabooga 02518a96a9 Lint 2025-12-06 06:55:06 -08:00
oobabooga 455dc06db0 Serve the original PNG images in the UI instead of webp 2025-12-06 05:43:00 -08:00
oobabooga 6ca99910ba Image: Quantize the text encoder for lower VRAM 2025-12-05 13:08:46 -08:00
oobabooga 11937de517 Use flash attention for image generation by default 2025-12-05 12:13:24 -08:00
oobabooga c11c14590a Image: Better LLM variation default prompt 2025-12-05 08:08:11 -08:00
oobabooga 0dd468245c Image: Add back the gallery cache (for performance) 2025-12-05 07:11:38 -08:00
oobabooga b63d57158d Image: Add TGW as a prefix to output images 2025-12-05 05:59:54 -08:00
oobabooga afa29b9554 Image: Several fixes 2025-12-05 05:58:57 -08:00
oobabooga 8eac99599a Image: Better LLM variation default prompt 2025-12-04 19:58:06 -08:00
oobabooga b4f06a50b0 fix: Pass bos_token and eos_token from metadata to jinja2
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga 56f2a9512f Revert "Image: Add the LLM-generated prompt to the API result"
This reverts commit c7ad28a4cd.
2025-12-04 17:34:27 -08:00
oobabooga c7ad28a4cd Image: Add the LLM-generated prompt to the API result 2025-12-04 17:22:08 -08:00
oobabooga b451bac082 Image: Improve a log message 2025-12-04 16:33:46 -08:00
oobabooga 47a0fcd614 Image: PNG metadata improvements 2025-12-04 16:25:48 -08:00
oobabooga ac31a7c008 Image: Organize the UI 2025-12-04 15:45:04 -08:00
oobabooga a90739f498 Image: Better LLM variation default prompt 2025-12-04 10:50:40 -08:00
oobabooga ffef3c7b1d Image: Make the LLM Variations prompt configurable 2025-12-04 10:44:35 -08:00
oobabooga 5763947c37 Image: Simplify the API code, add the llm_variations option 2025-12-04 10:23:00 -08:00
oobabooga 2793153717 Image: Add LLM-generated prompt variations 2025-12-04 08:10:24 -08:00
oobabooga 7fb9f19bd8 Progress bar style improvements 2025-12-04 06:20:45 -08:00
oobabooga a838223d18 Image: Add a progress bar during generation 2025-12-04 05:49:57 -08:00
oobabooga 14dbc3488e Image: Clear the torch cache after generation, not before 2025-12-04 05:32:58 -08:00
oobabooga c357eed4c7 Image: Remove the flash_attention_3 option (no idea how to get it working) 2025-12-03 18:40:34 -08:00
oobabooga fbca54957e Image generation: Yield partial results for batch count > 1 2025-12-03 16:13:07 -08:00
oobabooga 49c60882bf Image generation: Safer image uploading 2025-12-03 16:07:51 -08:00
oobabooga 59285d501d Image generation: Small UI improvements 2025-12-03 16:03:31 -08:00
oobabooga 373baa5c9c UI: Minor image gallery improvements 2025-12-03 14:45:02 -08:00
oobabooga 9448bf1caa Image generation: add torchao quantization (supports torch.compile) 2025-12-02 14:22:51 -08:00
oobabooga 97281ff831 UI: Fix an index error in the new image gallery 2025-12-02 11:20:52 -08:00
oobabooga 9d07d3a229 Make portable builds functional again after b3666e140d 2025-12-02 10:06:57 -08:00
oobabooga 6291e72129 Remove quanto for now (requires messy compilation) 2025-12-02 09:57:18 -08:00
oobabooga b3666e140d
Add image generation support (#7328) 2025-12-02 14:55:38 -03:00
oobabooga 5327bc9397
Update modules/shared.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:48:05 -03:00
GodEmperor785 400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316) 2025-11-21 16:56:02 -03:00
oobabooga 8f0048663d More modular HTML generator 2025-11-21 07:09:16 -08:00
oobabooga 0d4eff284c Add a --cpu-moe model for llama.cpp 2025-11-19 05:23:43 -08:00
Trenten Miller 6871484398
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training 2025-10-28 16:48:04 -03:00
oobabooga a156ebbf76 Lint 2025-10-15 13:15:01 -07:00
oobabooga c871d9cdbd Revert "Same as 7f06aec3a1 but for exllamav3_hf"
This reverts commit deb37b821b.
2025-10-15 13:05:41 -07:00
oobabooga b5a6904c4a Make --trust-remote-code immutable from the UI/API 2025-10-14 20:47:01 -07:00
mamei16 308e726e11
log error when llama-server request exceeds context size (#7263) 2025-10-12 23:00:11 -03:00
oobabooga 655c3e86e3 Fix "continue" missing an initial space in chat-instruct/chat modes 2025-10-11 17:00:25 -07:00
oobabooga c7dd920dc8 Fix metadata leaking into branched chats 2025-10-11 14:12:05 -07:00
oobabooga 78ff21d512 Organize the --help message 2025-10-10 15:21:08 -07:00
oobabooga 0d03813e98
Update modules/chat.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 21:01:13 -03:00