Commit graph

2043 commits

Author SHA1 Message Date
oobabooga 8a3d866401 Fix temperature_last having no effect in llama.cpp server sampler order 2026-03-04 06:10:51 -08:00
oobabooga b3fd0d16e0 Use a new gr.Headless component for efficient chat streaming 2026-03-03 18:12:03 -08:00
oobabooga 2260e530c9 Remove gradio monkey-patches (moved to gradio fork) 2026-03-03 17:17:36 -08:00
oobabooga c54e8a2b3d Try to spawn llama.cpp on port 5001 instead of random port 2026-01-28 08:23:55 -08:00
oobabooga dc2bbf1861 Refactor thinking block detection and add Solar Open support 2026-01-28 08:21:34 -08:00
q5sys (JT) 7493fe7841
feat: Add a dropdown to save/load user personas (#7367) 2026-01-14 20:35:08 -03:00
Sergey 'Jin' Bostandzhyan 6e2c4e9c23
Fix loading models which have their eos token disabled (#7363) 2026-01-06 11:31:10 -03:00
oobabooga e7c8b51fec Revert "Use flash_attention_2 by default for Transformers models"
This reverts commit 85f2df92e9.
2025-12-07 18:48:41 -08:00
oobabooga b758059e95 Revert "Clear the torch cache between sequential image generations"
This reverts commit 1ec9f708e5.
2025-12-07 12:23:19 -08:00
oobabooga 1ec9f708e5 Clear the torch cache between sequential image generations 2025-12-07 11:49:22 -08:00
oobabooga 85f2df92e9 Use flash_attention_2 by default for Transformers models 2025-12-07 06:56:58 -08:00
oobabooga 1762312fb4 Use random instead of np.random for image seeds (makes it work on Windows) 2025-12-06 20:10:32 -08:00
oobabooga 02518a96a9 Lint 2025-12-06 06:55:06 -08:00
oobabooga 455dc06db0 Serve the original PNG images in the UI instead of webp 2025-12-06 05:43:00 -08:00
oobabooga 6ca99910ba Image: Quantize the text encoder for lower VRAM 2025-12-05 13:08:46 -08:00
oobabooga 11937de517 Use flash attention for image generation by default 2025-12-05 12:13:24 -08:00
oobabooga c11c14590a Image: Better LLM variation default prompt 2025-12-05 08:08:11 -08:00
oobabooga 0dd468245c Image: Add back the gallery cache (for performance) 2025-12-05 07:11:38 -08:00
oobabooga b63d57158d Image: Add TGW as a prefix to output images 2025-12-05 05:59:54 -08:00
oobabooga afa29b9554 Image: Several fixes 2025-12-05 05:58:57 -08:00
oobabooga 8eac99599a Image: Better LLM variation default prompt 2025-12-04 19:58:06 -08:00
oobabooga b4f06a50b0 fix: Pass bos_token and eos_token from metadata to jinja2
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga 56f2a9512f Revert "Image: Add the LLM-generated prompt to the API result"
This reverts commit c7ad28a4cd.
2025-12-04 17:34:27 -08:00
oobabooga c7ad28a4cd Image: Add the LLM-generated prompt to the API result 2025-12-04 17:22:08 -08:00
oobabooga b451bac082 Image: Improve a log message 2025-12-04 16:33:46 -08:00
oobabooga 47a0fcd614 Image: PNG metadata improvements 2025-12-04 16:25:48 -08:00
oobabooga ac31a7c008 Image: Organize the UI 2025-12-04 15:45:04 -08:00
oobabooga a90739f498 Image: Better LLM variation default prompt 2025-12-04 10:50:40 -08:00
oobabooga ffef3c7b1d Image: Make the LLM Variations prompt configurable 2025-12-04 10:44:35 -08:00
oobabooga 5763947c37 Image: Simplify the API code, add the llm_variations option 2025-12-04 10:23:00 -08:00
oobabooga 2793153717 Image: Add LLM-generated prompt variations 2025-12-04 08:10:24 -08:00
oobabooga 7fb9f19bd8 Progress bar style improvements 2025-12-04 06:20:45 -08:00
oobabooga a838223d18 Image: Add a progress bar during generation 2025-12-04 05:49:57 -08:00
oobabooga 14dbc3488e Image: Clear the torch cache after generation, not before 2025-12-04 05:32:58 -08:00
oobabooga c357eed4c7 Image: Remove the flash_attention_3 option (no idea how to get it working) 2025-12-03 18:40:34 -08:00
oobabooga fbca54957e Image generation: Yield partial results for batch count > 1 2025-12-03 16:13:07 -08:00
oobabooga 49c60882bf Image generation: Safer image uploading 2025-12-03 16:07:51 -08:00
oobabooga 59285d501d Image generation: Small UI improvements 2025-12-03 16:03:31 -08:00
oobabooga 373baa5c9c UI: Minor image gallery improvements 2025-12-03 14:45:02 -08:00
oobabooga 9448bf1caa Image generation: add torchao quantization (supports torch.compile) 2025-12-02 14:22:51 -08:00
oobabooga 97281ff831 UI: Fix an index error in the new image gallery 2025-12-02 11:20:52 -08:00
oobabooga 9d07d3a229 Make portable builds functional again after b3666e140d 2025-12-02 10:06:57 -08:00
oobabooga 6291e72129 Remove quanto for now (requires messy compilation) 2025-12-02 09:57:18 -08:00
oobabooga b3666e140d
Add image generation support (#7328) 2025-12-02 14:55:38 -03:00
oobabooga 5327bc9397
Update modules/shared.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:48:05 -03:00
GodEmperor785 400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316) 2025-11-21 16:56:02 -03:00
oobabooga 8f0048663d More modular HTML generator 2025-11-21 07:09:16 -08:00
oobabooga 0d4eff284c Add a --cpu-moe model for llama.cpp 2025-11-19 05:23:43 -08:00
Trenten Miller 6871484398
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training 2025-10-28 16:48:04 -03:00
oobabooga a156ebbf76 Lint 2025-10-15 13:15:01 -07:00
oobabooga c871d9cdbd Revert "Same as 7f06aec3a1 but for exllamav3_hf"
This reverts commit deb37b821b.
2025-10-15 13:05:41 -07:00
oobabooga b5a6904c4a Make --trust-remote-code immutable from the UI/API 2025-10-14 20:47:01 -07:00
mamei16 308e726e11
log error when llama-server request exceeds context size (#7263) 2025-10-12 23:00:11 -03:00
oobabooga 655c3e86e3 Fix "continue" missing an initial space in chat-instruct/chat modes 2025-10-11 17:00:25 -07:00
oobabooga c7dd920dc8 Fix metadata leaking into branched chats 2025-10-11 14:12:05 -07:00
oobabooga 78ff21d512 Organize the --help message 2025-10-10 15:21:08 -07:00
oobabooga 0d03813e98
Update modules/chat.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 21:01:13 -03:00
oobabooga deb37b821b Same as 7f06aec3a1 but for exllamav3_hf 2025-10-09 13:02:38 -07:00
oobabooga 7f06aec3a1 exllamav3: Implement the logits function for /v1/internal/logits 2025-10-09 11:24:25 -07:00
oobabooga 218dc01b51 Add fallbacks after 93aa7b3ed3 2025-10-09 10:59:34 -07:00
oobabooga 282aa19189 Safer profile picture uploading 2025-10-09 09:26:35 -07:00
oobabooga 93aa7b3ed3 Better handle multigpu setups with transformers + bitsandbytes 2025-10-09 08:49:44 -07:00
Remowylliams 38a7fd685d
chat.py fixes Instruct mode History 2025-10-05 11:34:47 -03:00
oobabooga 1e863a7113 Fix exllamav3 ignoring the stop button 2025-09-19 16:12:50 -07:00
stevenxdavis dd6d2223a5
Changing transformers_loader.py to Match User Expectations for --bf16 and Flash Attention 2 (#7217) 2025-09-17 16:39:04 -03:00
oobabooga 9e9ab39892 Make exllamav3_hf and exllamav2_hf functional again 2025-09-17 12:29:22 -07:00
oobabooga f3829b268a llama.cpp: Always pass --flash-attn on 2025-09-02 12:12:17 -07:00
oobabooga c6ea67bbdb Lint 2025-09-02 10:22:03 -07:00
oobabooga 00ed878b05 Slightly more robust model loading 2025-09-02 10:16:26 -07:00
oobabooga 387e249dec Change an info message 2025-08-31 16:27:10 -07:00
oobabooga 8028d88541 Lint 2025-08-30 21:29:20 -07:00
oobabooga 13876a1ee8 llama.cpp: Remove the --flash-attn flag (it's always on now) 2025-08-30 20:28:26 -07:00
oobabooga 3a3e247f3c Even better way to handle continue for thinking blocks 2025-08-30 12:36:35 -07:00
oobabooga cf1aad2a68 Fix "continue" for Byte-OSS for partial thinking blocks 2025-08-30 12:16:45 -07:00
oobabooga 96136ea760 Fix LaTeX rendering for equations with asterisks 2025-08-30 10:13:32 -07:00
oobabooga a3eb67e466 Fix the UI failing to launch if the Notebook prompt is too long 2025-08-30 08:42:26 -07:00
oobabooga a2b37adb26 UI: Preload the correct fonts for chat mode 2025-08-29 09:25:44 -07:00
oobabooga cb8780a4ce Safer check for is_multimodal when loading models
Avoids unrelated multimodal error when a model fails to load due
to lack of memory.
2025-08-28 11:13:19 -07:00
oobabooga cfc83745ec UI: Improve right sidebar borders in light mode 2025-08-28 08:34:48 -07:00
oobabooga ba6041251d UI: Minor change 2025-08-28 06:20:00 -07:00
oobabooga a92758a144 llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS 2025-08-27 16:15:40 -07:00
oobabooga 030ba7bfeb UI: Mention that Seed-OSS uses enable_thinking 2025-08-27 07:44:35 -07:00
oobabooga 0b4518e61c "Text generation web UI" -> "Text Generation Web UI" 2025-08-27 05:53:09 -07:00
oobabooga 02ca96fa44 Multiple fixes 2025-08-25 22:17:22 -07:00
oobabooga 6a7166fffa Add support for the Seed-OSS template 2025-08-25 19:46:48 -07:00
oobabooga 8fcb4b3102 Make bot_prefix extensions functional again 2025-08-25 19:10:46 -07:00
oobabooga 8f660aefe3 Fix chat-instruct replies leaking the bot name sometimes 2025-08-25 18:50:16 -07:00
oobabooga a531328f7e Fix the GPT-OSS stopping string 2025-08-25 18:41:58 -07:00
oobabooga 6c165d2e55 Fix the chat template 2025-08-25 18:28:43 -07:00
oobabooga b657be7381 Obtain stopping strings in chat mode 2025-08-25 18:22:08 -07:00
oobabooga ded6c41cf8 Fix impersonate for chat-instruct 2025-08-25 18:16:17 -07:00
oobabooga c1aa4590ea Code simplifications, fix impersonate 2025-08-25 18:05:40 -07:00
oobabooga b330ec3517 Simplifications 2025-08-25 17:54:15 -07:00
oobabooga 3ad5970374 Make the llama.cpp --verbose output less verbose 2025-08-25 17:43:21 -07:00
oobabooga adeca8a658 Remove changes to the jinja2 templates 2025-08-25 17:36:01 -07:00
oobabooga aad0104c1b Remove a function 2025-08-25 17:33:13 -07:00
oobabooga f919cdf881 chat.py code simplifications 2025-08-25 17:20:51 -07:00
oobabooga d08800c359 chat.py improvements 2025-08-25 17:03:37 -07:00
oobabooga 3bc48014a5 chat.py code simplifications 2025-08-25 16:48:21 -07:00
oobabooga 2478294c06 UI: Preload the instruct and chat fonts 2025-08-24 12:37:41 -07:00