oobabooga
27bcc45c18
API: Add command-line flags to override default generation parameters
2026-03-06 01:36:45 -03:00
oobabooga
8a9afcbec6
Allow extensions to skip output post-processing
2026-03-06 01:19:46 -03:00
oobabooga
ddcad3cc51
Follow-up to e2548f69: add missing paths module, fix gallery extension
2026-03-06 00:58:03 -03:00
oobabooga
8d43123f73
API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
...
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.
Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}
Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga
e2548f69a9
Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data
...
If --user-data-dir is not set, auto-detect: use ../user_data when
./user_data doesn't exist, making it easy to share user data across
portable builds by placing it one folder up.
2026-03-05 19:31:10 -08:00
oobabooga
4c406e024f
API: Speed up chat completions by ~85ms per request
2026-03-05 18:36:07 -08:00
oobabooga
249bd6eea2
UI: Update the parallel info message
2026-03-05 18:11:55 -08:00
oobabooga
f52d9336e5
TensorRT-LLM: Migrate from ModelRunner to LLM API, add concurrent API request support
2026-03-05 18:09:45 -08:00
oobabooga
9824c82cb6
API: Add parallel request support for llama.cpp and ExLlamaV3
2026-03-05 16:49:58 -08:00
oobabooga
2f08dce7b0
Remove ExLlamaV2 backend
...
- archived upstream: 7dc12af3a8
- replaced by ExLlamaV3, which has much better quantization accuracy
2026-03-05 14:02:13 -08:00
oobabooga
86d8291e58
Training: UI cleanup and better defaults
2026-03-05 11:20:55 -08:00
oobabooga
33ff3773a0
Clean up LoRA loading parameter handling
2026-03-05 16:00:13 -03:00
oobabooga
7a1fa8c9ea
Training: fix checkpoint resume and surface training errors to UI
2026-03-05 15:50:39 -03:00
oobabooga
275810c843
Training: wire up HF Trainer checkpoint resumption for full state recovery
2026-03-05 15:32:49 -03:00
oobabooga
63f28cb4a2
Training: align defaults with peft/axolotl (rank 8, alpha 16, dropout 0, cutoff 512, eos on)
2026-03-05 15:12:32 -03:00
oobabooga
33a38d7ece
Training: drop conversations exceeding cutoff length instead of truncating
2026-03-05 14:56:27 -03:00
oobabooga
c2e494963f
Training: fix silent error on model reload failure, minor cleanups
2026-03-05 14:41:44 -03:00
oobabooga
5b18be8582
Training: unify instruction training through apply_chat_template()
...
Instead of two separate paths (format files vs Chat Template), all
instruction training now uses apply_chat_template() with assistant-only
label masking. Users pick a Jinja2 template from the dropdown or use the
model's built-in chat template — both work identically.
2026-03-05 14:39:37 -03:00
oobabooga
d337ba0390
Training: fix apply_chat_template returning BatchEncoding instead of list
2026-03-05 13:45:28 -03:00
oobabooga
1c2548fd89
Training: use dynamic padding (pad to batch max instead of cutoff_len)
...
- Remove pre-padding from tokenize() and tokenize_conversation()
- Collate function now right-pads each batch to the longest sequence
- Set tokenizer padding_side to "right" (standard for training)
- Remove dead natural_keys import
- Reduces wasted compute on batches with short sequences
- Aligns with axolotl/unsloth approach
2026-03-05 12:45:32 -03:00
oobabooga
da2d4f1a6a
Training: replace raw text file with JSONL text dataset, re-add stride overlap
...
- Replace "Raw text file" tab with "Text Dataset" tab using JSONL format with "text" key per row
- Re-add stride overlap for chunking (configurable Stride Length slider, 0-2048 tokens)
- Pad remainder chunks instead of dropping them
- Remove hard_cut_string, min_chars, raw_text_file parameters
- Remove .txt file and directory loading support
2026-03-05 12:33:12 -03:00
oobabooga
d278bb46a2
Add apply_chat_template() support for LoRA training
...
- Support multi-turn conversations (OpenAI messages + ShareGPT formats)
- Automatic assistant-only label masking via incremental tokenization
- Use tokenizer.apply_chat_template() for proper special token handling
- Add "Chat Template" option to the Data Format dropdown
- Also accept instruction/output datasets (auto-converted to messages)
- Validate chat template availability and dataset format upfront
- Fix after_tokens[-1] IndexError when train_only_after is at end of prompt
- Update docs
2026-03-05 11:47:25 -03:00
oobabooga
45188eccef
Overhaul LoRA training tab
...
- Use peft's "all-linear" for target modules instead of the old
model_to_lora_modules mapping (only knew ~39 model types)
- Add "Target all linear layers" checkbox, on by default
- Fix labels in tokenize() — were [1]s instead of actual token IDs
- Replace DataCollatorForLanguageModeling with custom collate_fn
- Raw text: concatenate-and-split instead of overlapping chunks
- Adapter backup/loading: check safetensors before bin
- Fix report_to=None crash on transformers 5.x
- Fix no_cuda deprecation for transformers 5.x (use use_cpu)
- Move torch.compile before Trainer init
- Add remove_unused_columns=False (torch.compile breaks column detection)
- Guard against no target modules selected
- Set tracked.did_save so we don't always save twice
- pad_token_id: fall back to eos_token_id instead of hardcoding 0
- Drop MODEL_CLASSES, split_chunks, cut_chunk_for_newline
- Update docs
2026-03-05 10:52:59 -03:00
oobabooga
268cc3f100
Update TensorRT-LLM to v1.1.0
2026-03-05 09:32:28 -03:00
oobabooga
69fa4dd0b1
llama.cpp: allow ctx_size=0 for auto context via --fit
2026-03-04 19:33:20 -08:00
oobabooga
fbfcd59fe0
llama.cpp: Use -1 instead of 0 for auto gpu_layers
2026-03-04 19:21:45 -08:00
oobabooga
d45aa6606a
Fix blank prompt dropdown in Notebook/Default tabs on first startup
2026-03-04 19:07:55 -08:00
oobabooga
0804296f4d
Revert "UI: Remove unnecessary server round-trips from button click chains"
...
This reverts commit ff48956cb0 .
2026-03-04 18:41:30 -08:00
oobabooga
ff48956cb0
UI: Remove unnecessary server round-trips from button click chains
2026-03-04 18:19:56 -08:00
oobabooga
387cf9d8df
Remove obsolete DeepSpeed inference code (2023 relic)
2026-03-04 17:20:34 -08:00
oobabooga
da3010c3ed
tiny improvements to llama_cpp_server.py
2026-03-04 15:54:37 -08:00
Sense_wang
7bf15ad933
fix: replace bare except clauses with except Exception ( #7400 )
2026-03-04 18:06:17 -03:00
mamei16
1d1f4dfc88
Disable uncommonly used indented codeblocks ( #7401 )
2026-03-04 17:51:00 -03:00
mamei16
68109bc5da
Improve process_markdown_content ( #7403 )
2026-03-04 17:26:13 -03:00
oobabooga
cdf0e392e6
llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults
2026-03-04 12:05:08 -08:00
oobabooga
eb90daf098
ExLlamaV2: Don't expose unused seed parameter
2026-03-04 11:14:50 -08:00
oobabooga
d8af0505a8
ExLlamav3_HF: Optimize prefill and fix CFG cache initialization
2026-03-04 11:09:58 -08:00
oobabooga
9b916f02cd
ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed
2026-03-04 10:51:15 -08:00
oobabooga
5d93f4e800
Fix requires_grad warning in logits API
2026-03-04 10:43:23 -08:00
oobabooga
64eb77e782
Fix the logits API endpoint with transformers
2026-03-04 10:41:47 -08:00
oobabooga
65de4c30c8
Add adaptive-p sampler and n-gram speculative decoding support
2026-03-04 09:41:29 -08:00
oobabooga
f010aa1612
Replace PyPDF2 with pymupdf for PDF text extraction
...
pymupdf produces cleaner text (e.g. no concatenated words in headers),
handles encrypted and malformed PDFs that PyPDF2 failed on, and
supports non-Latin scripts.
2026-03-04 06:43:37 -08:00
oobabooga
f4d787ab8d
Delegate GPU layer allocation to llama.cpp's --fit
2026-03-04 06:37:50 -08:00
oobabooga
8a3d866401
Fix temperature_last having no effect in llama.cpp server sampler order
2026-03-04 06:10:51 -08:00
oobabooga
b3fd0d16e0
Use a new gr.Headless component for efficient chat streaming
2026-03-03 18:12:03 -08:00
oobabooga
2260e530c9
Remove gradio monkey-patches (moved to gradio fork)
2026-03-03 17:17:36 -08:00
oobabooga
c54e8a2b3d
Try to spawn llama.cpp on port 5001 instead of random port
2026-01-28 08:23:55 -08:00
oobabooga
dc2bbf1861
Refactor thinking block detection and add Solar Open support
2026-01-28 08:21:34 -08:00
q5sys (JT)
7493fe7841
feat: Add a dropdown to save/load user personas ( #7367 )
2026-01-14 20:35:08 -03:00
Sergey 'Jin' Bostandzhyan
6e2c4e9c23
Fix loading models which have their eos token disabled ( #7363 )
2026-01-06 11:31:10 -03:00
oobabooga
e7c8b51fec
Revert "Use flash_attention_2 by default for Transformers models"
...
This reverts commit 85f2df92e9 .
2025-12-07 18:48:41 -08:00
oobabooga
b758059e95
Revert "Clear the torch cache between sequential image generations"
...
This reverts commit 1ec9f708e5 .
2025-12-07 12:23:19 -08:00
oobabooga
1ec9f708e5
Clear the torch cache between sequential image generations
2025-12-07 11:49:22 -08:00
oobabooga
85f2df92e9
Use flash_attention_2 by default for Transformers models
2025-12-07 06:56:58 -08:00
oobabooga
1762312fb4
Use random instead of np.random for image seeds (makes it work on Windows)
2025-12-06 20:10:32 -08:00
oobabooga
02518a96a9
Lint
2025-12-06 06:55:06 -08:00
oobabooga
455dc06db0
Serve the original PNG images in the UI instead of webp
2025-12-06 05:43:00 -08:00
oobabooga
6ca99910ba
Image: Quantize the text encoder for lower VRAM
2025-12-05 13:08:46 -08:00
oobabooga
11937de517
Use flash attention for image generation by default
2025-12-05 12:13:24 -08:00
oobabooga
c11c14590a
Image: Better LLM variation default prompt
2025-12-05 08:08:11 -08:00
oobabooga
0dd468245c
Image: Add back the gallery cache (for performance)
2025-12-05 07:11:38 -08:00
oobabooga
b63d57158d
Image: Add TGW as a prefix to output images
2025-12-05 05:59:54 -08:00
oobabooga
afa29b9554
Image: Several fixes
2025-12-05 05:58:57 -08:00
oobabooga
8eac99599a
Image: Better LLM variation default prompt
2025-12-04 19:58:06 -08:00
oobabooga
b4f06a50b0
fix: Pass bos_token and eos_token from metadata to jinja2
...
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga
56f2a9512f
Revert "Image: Add the LLM-generated prompt to the API result"
...
This reverts commit c7ad28a4cd .
2025-12-04 17:34:27 -08:00
oobabooga
c7ad28a4cd
Image: Add the LLM-generated prompt to the API result
2025-12-04 17:22:08 -08:00
oobabooga
b451bac082
Image: Improve a log message
2025-12-04 16:33:46 -08:00
oobabooga
47a0fcd614
Image: PNG metadata improvements
2025-12-04 16:25:48 -08:00
oobabooga
ac31a7c008
Image: Organize the UI
2025-12-04 15:45:04 -08:00
oobabooga
a90739f498
Image: Better LLM variation default prompt
2025-12-04 10:50:40 -08:00
oobabooga
ffef3c7b1d
Image: Make the LLM Variations prompt configurable
2025-12-04 10:44:35 -08:00
oobabooga
5763947c37
Image: Simplify the API code, add the llm_variations option
2025-12-04 10:23:00 -08:00
oobabooga
2793153717
Image: Add LLM-generated prompt variations
2025-12-04 08:10:24 -08:00
oobabooga
7fb9f19bd8
Progress bar style improvements
2025-12-04 06:20:45 -08:00
oobabooga
a838223d18
Image: Add a progress bar during generation
2025-12-04 05:49:57 -08:00
oobabooga
14dbc3488e
Image: Clear the torch cache after generation, not before
2025-12-04 05:32:58 -08:00
oobabooga
c357eed4c7
Image: Remove the flash_attention_3 option (no idea how to get it working)
2025-12-03 18:40:34 -08:00
oobabooga
fbca54957e
Image generation: Yield partial results for batch count > 1
2025-12-03 16:13:07 -08:00
oobabooga
49c60882bf
Image generation: Safer image uploading
2025-12-03 16:07:51 -08:00
oobabooga
59285d501d
Image generation: Small UI improvements
2025-12-03 16:03:31 -08:00
oobabooga
373baa5c9c
UI: Minor image gallery improvements
2025-12-03 14:45:02 -08:00
oobabooga
9448bf1caa
Image generation: add torchao quantization (supports torch.compile)
2025-12-02 14:22:51 -08:00
oobabooga
97281ff831
UI: Fix an index error in the new image gallery
2025-12-02 11:20:52 -08:00
oobabooga
9d07d3a229
Make portable builds functional again after b3666e140d
2025-12-02 10:06:57 -08:00
oobabooga
6291e72129
Remove quanto for now (requires messy compilation)
2025-12-02 09:57:18 -08:00
oobabooga
b3666e140d
Add image generation support ( #7328 )
2025-12-02 14:55:38 -03:00
oobabooga
5327bc9397
Update modules/shared.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:48:05 -03:00
GodEmperor785
400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance ( #7316 )
2025-11-21 16:56:02 -03:00
oobabooga
8f0048663d
More modular HTML generator
2025-11-21 07:09:16 -08:00
oobabooga
0d4eff284c
Add a --cpu-moe model for llama.cpp
2025-11-19 05:23:43 -08:00
Trenten Miller
6871484398
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training
2025-10-28 16:48:04 -03:00
oobabooga
a156ebbf76
Lint
2025-10-15 13:15:01 -07:00
oobabooga
c871d9cdbd
Revert "Same as 7f06aec3a1 but for exllamav3_hf"
...
This reverts commit deb37b821b .
2025-10-15 13:05:41 -07:00
oobabooga
b5a6904c4a
Make --trust-remote-code immutable from the UI/API
2025-10-14 20:47:01 -07:00
mamei16
308e726e11
log error when llama-server request exceeds context size ( #7263 )
2025-10-12 23:00:11 -03:00
oobabooga
655c3e86e3
Fix "continue" missing an initial space in chat-instruct/chat modes
2025-10-11 17:00:25 -07:00
oobabooga
c7dd920dc8
Fix metadata leaking into branched chats
2025-10-11 14:12:05 -07:00
oobabooga
78ff21d512
Organize the --help message
2025-10-10 15:21:08 -07:00
oobabooga
0d03813e98
Update modules/chat.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 21:01:13 -03:00