oobabooga
66fb79fe15
llama.cpp: Add --fit-target param
2026-03-06 01:55:48 -03:00
oobabooga
e81a47f708
Improve the API generation defaults --help message
2026-03-05 20:41:45 -08:00
oobabooga
27bcc45c18
API: Add command-line flags to override default generation parameters
2026-03-06 01:36:45 -03:00
oobabooga
e2548f69a9
Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data
...
If --user-data-dir is not set, auto-detect: use ../user_data when
./user_data doesn't exist, making it easy to share user data across
portable builds by placing it one folder up.
2026-03-05 19:31:10 -08:00
oobabooga
f52d9336e5
TensorRT-LLM: Migrate from ModelRunner to LLM API, add concurrent API request support
2026-03-05 18:09:45 -08:00
oobabooga
9824c82cb6
API: Add parallel request support for llama.cpp and ExLlamaV3
2026-03-05 16:49:58 -08:00
oobabooga
2f08dce7b0
Remove ExLlamaV2 backend
...
- archived upstream: 7dc12af3a8
- replaced by ExLlamaV3, which has much better quantization accuracy
2026-03-05 14:02:13 -08:00
oobabooga
268cc3f100
Update TensorRT-LLM to v1.1.0
2026-03-05 09:32:28 -03:00
oobabooga
69fa4dd0b1
llama.cpp: allow ctx_size=0 for auto context via --fit
2026-03-04 19:33:20 -08:00
oobabooga
fbfcd59fe0
llama.cpp: Use -1 instead of 0 for auto gpu_layers
2026-03-04 19:21:45 -08:00
oobabooga
387cf9d8df
Remove obsolete DeepSpeed inference code (2023 relic)
2026-03-04 17:20:34 -08:00
oobabooga
cdf0e392e6
llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults
2026-03-04 12:05:08 -08:00
oobabooga
65de4c30c8
Add adaptive-p sampler and n-gram speculative decoding support
2026-03-04 09:41:29 -08:00
oobabooga
f4d787ab8d
Delegate GPU layer allocation to llama.cpp's --fit
2026-03-04 06:37:50 -08:00
q5sys (JT)
7493fe7841
feat: Add a dropdown to save/load user personas ( #7367 )
2026-01-14 20:35:08 -03:00
oobabooga
e7c8b51fec
Revert "Use flash_attention_2 by default for Transformers models"
...
This reverts commit 85f2df92e9 .
2025-12-07 18:48:41 -08:00
oobabooga
85f2df92e9
Use flash_attention_2 by default for Transformers models
2025-12-07 06:56:58 -08:00
oobabooga
11937de517
Use flash attention for image generation by default
2025-12-05 12:13:24 -08:00
oobabooga
c11c14590a
Image: Better LLM variation default prompt
2025-12-05 08:08:11 -08:00
oobabooga
8eac99599a
Image: Better LLM variation default prompt
2025-12-04 19:58:06 -08:00
oobabooga
b4f06a50b0
fix: Pass bos_token and eos_token from metadata to jinja2
...
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga
a90739f498
Image: Better LLM variation default prompt
2025-12-04 10:50:40 -08:00
oobabooga
ffef3c7b1d
Image: Make the LLM Variations prompt configurable
2025-12-04 10:44:35 -08:00
oobabooga
2793153717
Image: Add LLM-generated prompt variations
2025-12-04 08:10:24 -08:00
oobabooga
c357eed4c7
Image: Remove the flash_attention_3 option (no idea how to get it working)
2025-12-03 18:40:34 -08:00
oobabooga
9448bf1caa
Image generation: add torchao quantization (supports torch.compile)
2025-12-02 14:22:51 -08:00
oobabooga
6291e72129
Remove quanto for now (requires messy compilation)
2025-12-02 09:57:18 -08:00
oobabooga
b3666e140d
Add image generation support ( #7328 )
2025-12-02 14:55:38 -03:00
oobabooga
5327bc9397
Update modules/shared.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:48:05 -03:00
GodEmperor785
400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance ( #7316 )
2025-11-21 16:56:02 -03:00
oobabooga
0d4eff284c
Add a --cpu-moe model for llama.cpp
2025-11-19 05:23:43 -08:00
oobabooga
b5a6904c4a
Make --trust-remote-code immutable from the UI/API
2025-10-14 20:47:01 -07:00
oobabooga
78ff21d512
Organize the --help message
2025-10-10 15:21:08 -07:00
oobabooga
13876a1ee8
llama.cpp: Remove the --flash-attn flag (it's always on now)
2025-08-30 20:28:26 -07:00
oobabooga
0b4518e61c
"Text generation web UI" -> "Text Generation Web UI"
2025-08-27 05:53:09 -07:00
oobabooga
02ca96fa44
Multiple fixes
2025-08-25 22:17:22 -07:00
oobabooga
6c165d2e55
Fix the chat template
2025-08-25 18:28:43 -07:00
oobabooga
dbabe67e77
ExLlamaV3: Enable the --enable-tp option, add a --tp-backend option
2025-08-17 13:19:11 -07:00
altoiddealer
57f6e9af5a
Set multimodal status during Model Loading ( #7199 )
2025-08-13 16:47:27 -03:00
oobabooga
d86b0ec010
Add multimodal support (llama.cpp) ( #7027 )
2025-08-10 01:27:25 -03:00
Katehuuh
88127f46c1
Add multimodal support (ExLlamaV3) ( #7174 )
2025-08-08 23:31:16 -03:00
oobabooga
498778b8ac
Add a new 'Reasoning effort' UI element
2025-08-05 15:19:11 -07:00
oobabooga
1d1b20bd77
Remove the --torch-compile option (it doesn't do anything currently)
2025-07-11 10:51:23 -07:00
oobabooga
273888f218
Revert "Use eager attention by default instead of sdpa"
...
This reverts commit bd4881c4dc .
2025-07-10 18:56:46 -07:00
oobabooga
e015355e4a
Update README
2025-07-09 20:03:53 -07:00
oobabooga
bd4881c4dc
Use eager attention by default instead of sdpa
2025-07-09 19:57:37 -07:00
oobabooga
6c2bdda0f0
Transformers loader: replace use_flash_attention_2/use_eager_attention with a unified attn_implementation
...
Closes #7107
2025-07-09 18:39:37 -07:00
oobabooga
faae4dc1b0
Autosave generated text in the Notebook tab ( #7079 )
2025-06-16 17:36:05 -03:00
oobabooga
de24b3bb31
Merge the Default and Notebook tabs into a single Notebook tab ( #7078 )
2025-06-16 13:19:29 -03:00
oobabooga
2dee3a66ff
Add an option to include/exclude attachments from previous messages in the chat prompt
2025-06-12 21:37:18 -07:00