oobabooga
eb90daf098
ExLlamaV2: Don't expose unused seed parameter
2026-03-04 11:14:50 -08:00
oobabooga
d8af0505a8
ExLlamav3_HF: Optimize prefill and fix CFG cache initialization
2026-03-04 11:09:58 -08:00
oobabooga
9b916f02cd
ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed
2026-03-04 10:51:15 -08:00
oobabooga
5d93f4e800
Fix requires_grad warning in logits API
2026-03-04 10:43:23 -08:00
oobabooga
64eb77e782
Fix the logits API endpoint with transformers
2026-03-04 10:41:47 -08:00
oobabooga
65de4c30c8
Add adaptive-p sampler and n-gram speculative decoding support
2026-03-04 09:41:29 -08:00
oobabooga
f010aa1612
Replace PyPDF2 with pymupdf for PDF text extraction
...
pymupdf produces cleaner text (e.g. no concatenated words in headers),
handles encrypted and malformed PDFs that PyPDF2 failed on, and
supports non-Latin scripts.
2026-03-04 06:43:37 -08:00
oobabooga
f4d787ab8d
Delegate GPU layer allocation to llama.cpp's --fit
2026-03-04 06:37:50 -08:00
oobabooga
8a3d866401
Fix temperature_last having no effect in llama.cpp server sampler order
2026-03-04 06:10:51 -08:00
oobabooga
b3fd0d16e0
Use a new gr.Headless component for efficient chat streaming
2026-03-03 18:12:03 -08:00
oobabooga
2260e530c9
Remove gradio monkey-patches (moved to gradio fork)
2026-03-03 17:17:36 -08:00
oobabooga
c54e8a2b3d
Try to spawn llama.cpp on port 5001 instead of random port
2026-01-28 08:23:55 -08:00
oobabooga
dc2bbf1861
Refactor thinking block detection and add Solar Open support
2026-01-28 08:21:34 -08:00
q5sys (JT)
7493fe7841
feat: Add a dropdown to save/load user personas ( #7367 )
2026-01-14 20:35:08 -03:00
Sergey 'Jin' Bostandzhyan
6e2c4e9c23
Fix loading models which have their eos token disabled ( #7363 )
2026-01-06 11:31:10 -03:00
oobabooga
e7c8b51fec
Revert "Use flash_attention_2 by default for Transformers models"
...
This reverts commit 85f2df92e9 .
2025-12-07 18:48:41 -08:00
oobabooga
b758059e95
Revert "Clear the torch cache between sequential image generations"
...
This reverts commit 1ec9f708e5 .
2025-12-07 12:23:19 -08:00
oobabooga
1ec9f708e5
Clear the torch cache between sequential image generations
2025-12-07 11:49:22 -08:00
oobabooga
85f2df92e9
Use flash_attention_2 by default for Transformers models
2025-12-07 06:56:58 -08:00
oobabooga
1762312fb4
Use random instead of np.random for image seeds (makes it work on Windows)
2025-12-06 20:10:32 -08:00
oobabooga
02518a96a9
Lint
2025-12-06 06:55:06 -08:00
oobabooga
455dc06db0
Serve the original PNG images in the UI instead of webp
2025-12-06 05:43:00 -08:00
oobabooga
6ca99910ba
Image: Quantize the text encoder for lower VRAM
2025-12-05 13:08:46 -08:00
oobabooga
11937de517
Use flash attention for image generation by default
2025-12-05 12:13:24 -08:00
oobabooga
c11c14590a
Image: Better LLM variation default prompt
2025-12-05 08:08:11 -08:00
oobabooga
0dd468245c
Image: Add back the gallery cache (for performance)
2025-12-05 07:11:38 -08:00
oobabooga
b63d57158d
Image: Add TGW as a prefix to output images
2025-12-05 05:59:54 -08:00
oobabooga
afa29b9554
Image: Several fixes
2025-12-05 05:58:57 -08:00
oobabooga
8eac99599a
Image: Better LLM variation default prompt
2025-12-04 19:58:06 -08:00
oobabooga
b4f06a50b0
fix: Pass bos_token and eos_token from metadata to jinja2
...
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga
56f2a9512f
Revert "Image: Add the LLM-generated prompt to the API result"
...
This reverts commit c7ad28a4cd .
2025-12-04 17:34:27 -08:00
oobabooga
c7ad28a4cd
Image: Add the LLM-generated prompt to the API result
2025-12-04 17:22:08 -08:00
oobabooga
b451bac082
Image: Improve a log message
2025-12-04 16:33:46 -08:00
oobabooga
47a0fcd614
Image: PNG metadata improvements
2025-12-04 16:25:48 -08:00
oobabooga
ac31a7c008
Image: Organize the UI
2025-12-04 15:45:04 -08:00
oobabooga
a90739f498
Image: Better LLM variation default prompt
2025-12-04 10:50:40 -08:00
oobabooga
ffef3c7b1d
Image: Make the LLM Variations prompt configurable
2025-12-04 10:44:35 -08:00
oobabooga
5763947c37
Image: Simplify the API code, add the llm_variations option
2025-12-04 10:23:00 -08:00
oobabooga
2793153717
Image: Add LLM-generated prompt variations
2025-12-04 08:10:24 -08:00
oobabooga
7fb9f19bd8
Progress bar style improvements
2025-12-04 06:20:45 -08:00
oobabooga
a838223d18
Image: Add a progress bar during generation
2025-12-04 05:49:57 -08:00
oobabooga
14dbc3488e
Image: Clear the torch cache after generation, not before
2025-12-04 05:32:58 -08:00
oobabooga
c357eed4c7
Image: Remove the flash_attention_3 option (no idea how to get it working)
2025-12-03 18:40:34 -08:00
oobabooga
fbca54957e
Image generation: Yield partial results for batch count > 1
2025-12-03 16:13:07 -08:00
oobabooga
49c60882bf
Image generation: Safer image uploading
2025-12-03 16:07:51 -08:00
oobabooga
59285d501d
Image generation: Small UI improvements
2025-12-03 16:03:31 -08:00
oobabooga
373baa5c9c
UI: Minor image gallery improvements
2025-12-03 14:45:02 -08:00
oobabooga
9448bf1caa
Image generation: add torchao quantization (supports torch.compile)
2025-12-02 14:22:51 -08:00
oobabooga
97281ff831
UI: Fix an index error in the new image gallery
2025-12-02 11:20:52 -08:00
oobabooga
9d07d3a229
Make portable builds functional again after b3666e140d
2025-12-02 10:06:57 -08:00
oobabooga
6291e72129
Remove quanto for now (requires messy compilation)
2025-12-02 09:57:18 -08:00
oobabooga
b3666e140d
Add image generation support ( #7328 )
2025-12-02 14:55:38 -03:00
oobabooga
5327bc9397
Update modules/shared.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:48:05 -03:00
GodEmperor785
400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance ( #7316 )
2025-11-21 16:56:02 -03:00
oobabooga
8f0048663d
More modular HTML generator
2025-11-21 07:09:16 -08:00
oobabooga
0d4eff284c
Add a --cpu-moe model for llama.cpp
2025-11-19 05:23:43 -08:00
Trenten Miller
6871484398
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training
2025-10-28 16:48:04 -03:00
oobabooga
a156ebbf76
Lint
2025-10-15 13:15:01 -07:00
oobabooga
c871d9cdbd
Revert "Same as 7f06aec3a1 but for exllamav3_hf"
...
This reverts commit deb37b821b .
2025-10-15 13:05:41 -07:00
oobabooga
b5a6904c4a
Make --trust-remote-code immutable from the UI/API
2025-10-14 20:47:01 -07:00
mamei16
308e726e11
log error when llama-server request exceeds context size ( #7263 )
2025-10-12 23:00:11 -03:00
oobabooga
655c3e86e3
Fix "continue" missing an initial space in chat-instruct/chat modes
2025-10-11 17:00:25 -07:00
oobabooga
c7dd920dc8
Fix metadata leaking into branched chats
2025-10-11 14:12:05 -07:00
oobabooga
78ff21d512
Organize the --help message
2025-10-10 15:21:08 -07:00
oobabooga
0d03813e98
Update modules/chat.py
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 21:01:13 -03:00
oobabooga
deb37b821b
Same as 7f06aec3a1 but for exllamav3_hf
2025-10-09 13:02:38 -07:00
oobabooga
7f06aec3a1
exllamav3: Implement the logits function for /v1/internal/logits
2025-10-09 11:24:25 -07:00
oobabooga
218dc01b51
Add fallbacks after 93aa7b3ed3
2025-10-09 10:59:34 -07:00
oobabooga
282aa19189
Safer profile picture uploading
2025-10-09 09:26:35 -07:00
oobabooga
93aa7b3ed3
Better handle multigpu setups with transformers + bitsandbytes
2025-10-09 08:49:44 -07:00
Remowylliams
38a7fd685d
chat.py fixes Instruct mode History
2025-10-05 11:34:47 -03:00
oobabooga
1e863a7113
Fix exllamav3 ignoring the stop button
2025-09-19 16:12:50 -07:00
stevenxdavis
dd6d2223a5
Changing transformers_loader.py to Match User Expectations for --bf16 and Flash Attention 2 ( #7217 )
2025-09-17 16:39:04 -03:00
oobabooga
9e9ab39892
Make exllamav3_hf and exllamav2_hf functional again
2025-09-17 12:29:22 -07:00
oobabooga
f3829b268a
llama.cpp: Always pass --flash-attn on
2025-09-02 12:12:17 -07:00
oobabooga
c6ea67bbdb
Lint
2025-09-02 10:22:03 -07:00
oobabooga
00ed878b05
Slightly more robust model loading
2025-09-02 10:16:26 -07:00
oobabooga
387e249dec
Change an info message
2025-08-31 16:27:10 -07:00
oobabooga
8028d88541
Lint
2025-08-30 21:29:20 -07:00
oobabooga
13876a1ee8
llama.cpp: Remove the --flash-attn flag (it's always on now)
2025-08-30 20:28:26 -07:00
oobabooga
3a3e247f3c
Even better way to handle continue for thinking blocks
2025-08-30 12:36:35 -07:00
oobabooga
cf1aad2a68
Fix "continue" for Byte-OSS for partial thinking blocks
2025-08-30 12:16:45 -07:00
oobabooga
96136ea760
Fix LaTeX rendering for equations with asterisks
2025-08-30 10:13:32 -07:00
oobabooga
a3eb67e466
Fix the UI failing to launch if the Notebook prompt is too long
2025-08-30 08:42:26 -07:00
oobabooga
a2b37adb26
UI: Preload the correct fonts for chat mode
2025-08-29 09:25:44 -07:00
oobabooga
cb8780a4ce
Safer check for is_multimodal when loading models
...
Avoids unrelated multimodal error when a model fails to load due
to lack of memory.
2025-08-28 11:13:19 -07:00
oobabooga
cfc83745ec
UI: Improve right sidebar borders in light mode
2025-08-28 08:34:48 -07:00
oobabooga
ba6041251d
UI: Minor change
2025-08-28 06:20:00 -07:00
oobabooga
a92758a144
llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS
2025-08-27 16:15:40 -07:00
oobabooga
030ba7bfeb
UI: Mention that Seed-OSS uses enable_thinking
2025-08-27 07:44:35 -07:00
oobabooga
0b4518e61c
"Text generation web UI" -> "Text Generation Web UI"
2025-08-27 05:53:09 -07:00
oobabooga
02ca96fa44
Multiple fixes
2025-08-25 22:17:22 -07:00
oobabooga
6a7166fffa
Add support for the Seed-OSS template
2025-08-25 19:46:48 -07:00
oobabooga
8fcb4b3102
Make bot_prefix extensions functional again
2025-08-25 19:10:46 -07:00
oobabooga
8f660aefe3
Fix chat-instruct replies leaking the bot name sometimes
2025-08-25 18:50:16 -07:00
oobabooga
a531328f7e
Fix the GPT-OSS stopping string
2025-08-25 18:41:58 -07:00
oobabooga
6c165d2e55
Fix the chat template
2025-08-25 18:28:43 -07:00
oobabooga
b657be7381
Obtain stopping strings in chat mode
2025-08-25 18:22:08 -07:00
oobabooga
ded6c41cf8
Fix impersonate for chat-instruct
2025-08-25 18:16:17 -07:00
oobabooga
c1aa4590ea
Code simplifications, fix impersonate
2025-08-25 18:05:40 -07:00