oobabooga
ff48956cb0
UI: Remove unnecessary server round-trips from button click chains
2026-03-04 18:19:56 -08:00
oobabooga
387cf9d8df
Remove obsolete DeepSpeed inference code (2023 relic)
2026-03-04 17:20:34 -08:00
oobabooga
da3010c3ed
tiny improvements to llama_cpp_server.py
2026-03-04 15:54:37 -08:00
Sense_wang
7bf15ad933
fix: replace bare except clauses with except Exception ( #7400 )
2026-03-04 18:06:17 -03:00
mamei16
1d1f4dfc88
Disable uncommonly used indented codeblocks ( #7401 )
2026-03-04 17:51:00 -03:00
mamei16
68109bc5da
Improve process_markdown_content ( #7403 )
2026-03-04 17:26:13 -03:00
oobabooga
cdf0e392e6
llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults
2026-03-04 12:05:08 -08:00
oobabooga
eb90daf098
ExLlamaV2: Don't expose unused seed parameter
2026-03-04 11:14:50 -08:00
oobabooga
d8af0505a8
ExLlamav3_HF: Optimize prefill and fix CFG cache initialization
2026-03-04 11:09:58 -08:00
oobabooga
9b916f02cd
ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed
2026-03-04 10:51:15 -08:00
oobabooga
5d93f4e800
Fix requires_grad warning in logits API
2026-03-04 10:43:23 -08:00
oobabooga
64eb77e782
Fix the logits API endpoint with transformers
2026-03-04 10:41:47 -08:00
oobabooga
65de4c30c8
Add adaptive-p sampler and n-gram speculative decoding support
2026-03-04 09:41:29 -08:00
oobabooga
f010aa1612
Replace PyPDF2 with pymupdf for PDF text extraction
...
pymupdf produces cleaner text (e.g. no concatenated words in headers),
handles encrypted and malformed PDFs that PyPDF2 failed on, and
supports non-Latin scripts.
2026-03-04 06:43:37 -08:00
oobabooga
f4d787ab8d
Delegate GPU layer allocation to llama.cpp's --fit
2026-03-04 06:37:50 -08:00
oobabooga
8a3d866401
Fix temperature_last having no effect in llama.cpp server sampler order
2026-03-04 06:10:51 -08:00
oobabooga
b3fd0d16e0
Use a new gr.Headless component for efficient chat streaming
2026-03-03 18:12:03 -08:00
oobabooga
2260e530c9
Remove gradio monkey-patches (moved to gradio fork)
2026-03-03 17:17:36 -08:00
oobabooga
c54e8a2b3d
Try to spawn llama.cpp on port 5001 instead of random port
2026-01-28 08:23:55 -08:00
oobabooga
dc2bbf1861
Refactor thinking block detection and add Solar Open support
2026-01-28 08:21:34 -08:00
q5sys (JT)
7493fe7841
feat: Add a dropdown to save/load user personas ( #7367 )
2026-01-14 20:35:08 -03:00
Sergey 'Jin' Bostandzhyan
6e2c4e9c23
Fix loading models which have their eos token disabled ( #7363 )
2026-01-06 11:31:10 -03:00
oobabooga
e7c8b51fec
Revert "Use flash_attention_2 by default for Transformers models"
...
This reverts commit 85f2df92e9 .
2025-12-07 18:48:41 -08:00
oobabooga
b758059e95
Revert "Clear the torch cache between sequential image generations"
...
This reverts commit 1ec9f708e5 .
2025-12-07 12:23:19 -08:00
oobabooga
1ec9f708e5
Clear the torch cache between sequential image generations
2025-12-07 11:49:22 -08:00
oobabooga
85f2df92e9
Use flash_attention_2 by default for Transformers models
2025-12-07 06:56:58 -08:00
oobabooga
1762312fb4
Use random instead of np.random for image seeds (makes it work on Windows)
2025-12-06 20:10:32 -08:00
oobabooga
02518a96a9
Lint
2025-12-06 06:55:06 -08:00
oobabooga
455dc06db0
Serve the original PNG images in the UI instead of webp
2025-12-06 05:43:00 -08:00
oobabooga
6ca99910ba
Image: Quantize the text encoder for lower VRAM
2025-12-05 13:08:46 -08:00
oobabooga
11937de517
Use flash attention for image generation by default
2025-12-05 12:13:24 -08:00
oobabooga
c11c14590a
Image: Better LLM variation default prompt
2025-12-05 08:08:11 -08:00
oobabooga
0dd468245c
Image: Add back the gallery cache (for performance)
2025-12-05 07:11:38 -08:00
oobabooga
b63d57158d
Image: Add TGW as a prefix to output images
2025-12-05 05:59:54 -08:00
oobabooga
afa29b9554
Image: Several fixes
2025-12-05 05:58:57 -08:00
oobabooga
8eac99599a
Image: Better LLM variation default prompt
2025-12-04 19:58:06 -08:00
oobabooga
b4f06a50b0
fix: Pass bos_token and eos_token from metadata to jinja2
...
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga
56f2a9512f
Revert "Image: Add the LLM-generated prompt to the API result"
...
This reverts commit c7ad28a4cd .
2025-12-04 17:34:27 -08:00
oobabooga
c7ad28a4cd
Image: Add the LLM-generated prompt to the API result
2025-12-04 17:22:08 -08:00
oobabooga
b451bac082
Image: Improve a log message
2025-12-04 16:33:46 -08:00
oobabooga
47a0fcd614
Image: PNG metadata improvements
2025-12-04 16:25:48 -08:00
oobabooga
ac31a7c008
Image: Organize the UI
2025-12-04 15:45:04 -08:00
oobabooga
a90739f498
Image: Better LLM variation default prompt
2025-12-04 10:50:40 -08:00
oobabooga
ffef3c7b1d
Image: Make the LLM Variations prompt configurable
2025-12-04 10:44:35 -08:00
oobabooga
5763947c37
Image: Simplify the API code, add the llm_variations option
2025-12-04 10:23:00 -08:00
oobabooga
2793153717
Image: Add LLM-generated prompt variations
2025-12-04 08:10:24 -08:00
oobabooga
7fb9f19bd8
Progress bar style improvements
2025-12-04 06:20:45 -08:00
oobabooga
a838223d18
Image: Add a progress bar during generation
2025-12-04 05:49:57 -08:00
oobabooga
14dbc3488e
Image: Clear the torch cache after generation, not before
2025-12-04 05:32:58 -08:00
oobabooga
c357eed4c7
Image: Remove the flash_attention_3 option (no idea how to get it working)
2025-12-03 18:40:34 -08:00
oobabooga
fbca54957e
Image generation: Yield partial results for batch count > 1
2025-12-03 16:13:07 -08:00
oobabooga
49c60882bf
Image generation: Safer image uploading
2025-12-03 16:07:51 -08:00
oobabooga
59285d501d
Image generation: Small UI improvements
2025-12-03 16:03:31 -08:00
oobabooga
373baa5c9c
UI: Minor image gallery improvements
2025-12-03 14:45:02 -08:00
oobabooga
9448bf1caa
Image generation: add torchao quantization (supports torch.compile)
2025-12-02 14:22:51 -08:00
oobabooga
97281ff831
UI: Fix an index error in the new image gallery
2025-12-02 11:20:52 -08:00
oobabooga
9d07d3a229
Make portable builds functional again after b3666e140d
2025-12-02 10:06:57 -08:00
oobabooga
6291e72129
Remove quanto for now (requires messy compilation)
2025-12-02 09:57:18 -08:00
oobabooga
b3666e140d
Add image generation support ( #7328 )
2025-12-02 14:55:38 -03:00
oobabooga
5327bc9397
Update modules/shared.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:48:05 -03:00
GodEmperor785
400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance ( #7316 )
2025-11-21 16:56:02 -03:00
oobabooga
8f0048663d
More modular HTML generator
2025-11-21 07:09:16 -08:00
oobabooga
0d4eff284c
Add a --cpu-moe model for llama.cpp
2025-11-19 05:23:43 -08:00
Trenten Miller
6871484398
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training
2025-10-28 16:48:04 -03:00
oobabooga
a156ebbf76
Lint
2025-10-15 13:15:01 -07:00
oobabooga
c871d9cdbd
Revert "Same as 7f06aec3a1 but for exllamav3_hf"
...
This reverts commit deb37b821b .
2025-10-15 13:05:41 -07:00
oobabooga
b5a6904c4a
Make --trust-remote-code immutable from the UI/API
2025-10-14 20:47:01 -07:00
mamei16
308e726e11
log error when llama-server request exceeds context size ( #7263 )
2025-10-12 23:00:11 -03:00
oobabooga
655c3e86e3
Fix "continue" missing an initial space in chat-instruct/chat modes
2025-10-11 17:00:25 -07:00
oobabooga
c7dd920dc8
Fix metadata leaking into branched chats
2025-10-11 14:12:05 -07:00
oobabooga
78ff21d512
Organize the --help message
2025-10-10 15:21:08 -07:00
oobabooga
0d03813e98
Update modules/chat.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 21:01:13 -03:00
oobabooga
deb37b821b
Same as 7f06aec3a1 but for exllamav3_hf
2025-10-09 13:02:38 -07:00
oobabooga
7f06aec3a1
exllamav3: Implement the logits function for /v1/internal/logits
2025-10-09 11:24:25 -07:00
oobabooga
218dc01b51
Add fallbacks after 93aa7b3ed3
2025-10-09 10:59:34 -07:00
oobabooga
282aa19189
Safer profile picture uploading
2025-10-09 09:26:35 -07:00
oobabooga
93aa7b3ed3
Better handle multigpu setups with transformers + bitsandbytes
2025-10-09 08:49:44 -07:00
Remowylliams
38a7fd685d
chat.py fixes Instruct mode History
2025-10-05 11:34:47 -03:00
oobabooga
1e863a7113
Fix exllamav3 ignoring the stop button
2025-09-19 16:12:50 -07:00
stevenxdavis
dd6d2223a5
Changing transformers_loader.py to Match User Expectations for --bf16 and Flash Attention 2 ( #7217 )
2025-09-17 16:39:04 -03:00
oobabooga
9e9ab39892
Make exllamav3_hf and exllamav2_hf functional again
2025-09-17 12:29:22 -07:00
oobabooga
f3829b268a
llama.cpp: Always pass --flash-attn on
2025-09-02 12:12:17 -07:00
oobabooga
c6ea67bbdb
Lint
2025-09-02 10:22:03 -07:00
oobabooga
00ed878b05
Slightly more robust model loading
2025-09-02 10:16:26 -07:00
oobabooga
387e249dec
Change an info message
2025-08-31 16:27:10 -07:00
oobabooga
8028d88541
Lint
2025-08-30 21:29:20 -07:00
oobabooga
13876a1ee8
llama.cpp: Remove the --flash-attn flag (it's always on now)
2025-08-30 20:28:26 -07:00
oobabooga
3a3e247f3c
Even better way to handle continue for thinking blocks
2025-08-30 12:36:35 -07:00
oobabooga
cf1aad2a68
Fix "continue" for Byte-OSS for partial thinking blocks
2025-08-30 12:16:45 -07:00
oobabooga
96136ea760
Fix LaTeX rendering for equations with asterisks
2025-08-30 10:13:32 -07:00
oobabooga
a3eb67e466
Fix the UI failing to launch if the Notebook prompt is too long
2025-08-30 08:42:26 -07:00
oobabooga
a2b37adb26
UI: Preload the correct fonts for chat mode
2025-08-29 09:25:44 -07:00
oobabooga
cb8780a4ce
Safer check for is_multimodal when loading models
...
Avoids unrelated multimodal error when a model fails to load due
to lack of memory.
2025-08-28 11:13:19 -07:00
oobabooga
cfc83745ec
UI: Improve right sidebar borders in light mode
2025-08-28 08:34:48 -07:00
oobabooga
ba6041251d
UI: Minor change
2025-08-28 06:20:00 -07:00
oobabooga
a92758a144
llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS
2025-08-27 16:15:40 -07:00
oobabooga
030ba7bfeb
UI: Mention that Seed-OSS uses enable_thinking
2025-08-27 07:44:35 -07:00
oobabooga
0b4518e61c
"Text generation web UI" -> "Text Generation Web UI"
2025-08-27 05:53:09 -07:00
oobabooga
02ca96fa44
Multiple fixes
2025-08-25 22:17:22 -07:00
oobabooga
6a7166fffa
Add support for the Seed-OSS template
2025-08-25 19:46:48 -07:00
oobabooga
8fcb4b3102
Make bot_prefix extensions functional again
2025-08-25 19:10:46 -07:00
oobabooga
8f660aefe3
Fix chat-instruct replies leaking the bot name sometimes
2025-08-25 18:50:16 -07:00
oobabooga
a531328f7e
Fix the GPT-OSS stopping string
2025-08-25 18:41:58 -07:00
oobabooga
6c165d2e55
Fix the chat template
2025-08-25 18:28:43 -07:00
oobabooga
b657be7381
Obtain stopping strings in chat mode
2025-08-25 18:22:08 -07:00
oobabooga
ded6c41cf8
Fix impersonate for chat-instruct
2025-08-25 18:16:17 -07:00
oobabooga
c1aa4590ea
Code simplifications, fix impersonate
2025-08-25 18:05:40 -07:00
oobabooga
b330ec3517
Simplifications
2025-08-25 17:54:15 -07:00
oobabooga
3ad5970374
Make the llama.cpp --verbose output less verbose
2025-08-25 17:43:21 -07:00
oobabooga
adeca8a658
Remove changes to the jinja2 templates
2025-08-25 17:36:01 -07:00
oobabooga
aad0104c1b
Remove a function
2025-08-25 17:33:13 -07:00
oobabooga
f919cdf881
chat.py code simplifications
2025-08-25 17:20:51 -07:00
oobabooga
d08800c359
chat.py improvements
2025-08-25 17:03:37 -07:00
oobabooga
3bc48014a5
chat.py code simplifications
2025-08-25 16:48:21 -07:00
oobabooga
2478294c06
UI: Preload the instruct and chat fonts
2025-08-24 12:37:41 -07:00
oobabooga
8be798e15f
llama.cpp: Fix stderr deadlock while loading some multimodal models
2025-08-24 12:20:05 -07:00
oobabooga
7fe8da8944
Minor simplification after f247c2ae62
2025-08-22 14:42:56 -07:00
oobabooga
f247c2ae62
Make --model work with absolute paths, eg --model /tmp/gemma-3-270m-it-IQ4_NL.gguf
2025-08-22 11:47:33 -07:00
oobabooga
9e7b326e34
Lint
2025-08-19 06:50:40 -07:00
oobabooga
1972479610
Add the TP option to exllamav3_HF
2025-08-19 06:48:22 -07:00
oobabooga
e0f5905a97
Code formatting
2025-08-19 06:34:05 -07:00
oobabooga
5b06284a8a
UI: Keep ExLlamav3_HF selected if already selected for EXL3 models
2025-08-19 06:23:21 -07:00
oobabooga
cbba58bef9
UI: Fix code blocks having an extra empty line
2025-08-18 15:50:09 -07:00
oobabooga
7d23a55901
Fix model unloading when switching loaders ( closes #7203 )
2025-08-18 09:05:47 -07:00
oobabooga
64eba9576c
mtmd: Fix a bug when "include past attachments" is unchecked
2025-08-17 14:08:40 -07:00
oobabooga
dbabe67e77
ExLlamaV3: Enable the --enable-tp option, add a --tp-backend option
2025-08-17 13:19:11 -07:00
oobabooga
d771ca4a13
Fix web search (attempt)
2025-08-14 12:05:14 -07:00
altoiddealer
57f6e9af5a
Set multimodal status during Model Loading ( #7199 )
2025-08-13 16:47:27 -03:00
oobabooga
41b95e9ec3
Lint
2025-08-12 13:37:37 -07:00
oobabooga
7301452b41
UI: Minor info message change
2025-08-12 13:23:24 -07:00
oobabooga
8d7b88106a
Revert "mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)"
...
This reverts commit d8fcc71616 .
2025-08-12 13:20:16 -07:00
oobabooga
2238302b49
ExLlamaV3: Add speculative decoding
2025-08-12 08:50:45 -07:00
oobabooga
d8fcc71616
mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)
2025-08-11 18:02:33 -07:00
oobabooga
e6447cd24a
mtmd: Update the llama-server request
2025-08-11 17:42:35 -07:00
oobabooga
0e3def449a
llama.cpp: --swa-full to llama-server when streaming-llm is checked
2025-08-11 15:17:25 -07:00
oobabooga
0e88a621fd
UI: Better organize the right sidebar
2025-08-11 15:16:03 -07:00
oobabooga
a78ca6ffcd
Remove a comment
2025-08-11 12:33:38 -07:00
oobabooga
999471256c
Lint
2025-08-11 12:32:17 -07:00
oobabooga
b62c8845f3
mtmd: Fix /chat/completions for llama.cpp
2025-08-11 12:01:59 -07:00
oobabooga
38c0b4a1ad
Default ctx-size to 8192 when not found in the metadata
2025-08-11 07:39:53 -07:00
oobabooga
52d1cbbbe9
Fix an import
2025-08-11 07:38:39 -07:00
oobabooga
4809ddfeb8
Exllamav3: small sampler fixes
2025-08-11 07:35:22 -07:00
oobabooga
4d8dbbab64
API: Fix sampler_priority usage for ExLlamaV3
2025-08-11 07:26:11 -07:00
oobabooga
0ea62d88f6
mtmd: Fix "continue" when an image is present
2025-08-09 21:47:02 -07:00
oobabooga
2f90ac9880
Move the new image_utils.py file to modules/
2025-08-09 21:41:38 -07:00
oobabooga
c6b4d1e87f
Fix the exllamav2 loader ignoring add_bos
2025-08-09 21:34:35 -07:00
oobabooga
d86b0ec010
Add multimodal support (llama.cpp) ( #7027 )
2025-08-10 01:27:25 -03:00
oobabooga
a289a92b94
Fix exllamav3 token count
2025-08-09 17:10:58 -07:00
oobabooga
d489eb589a
Attempt at fixing new exllamav3 loader undefined behavior when switching conversations
2025-08-09 14:11:31 -07:00
oobabooga
a6d6bee88c
Change a comment
2025-08-09 07:51:03 -07:00