Commit graph

2051 commits

Author SHA1 Message Date
oobabooga eb90daf098 ExLlamaV2: Don't expose unused seed parameter 2026-03-04 11:14:50 -08:00
oobabooga d8af0505a8 ExLlamav3_HF: Optimize prefill and fix CFG cache initialization 2026-03-04 11:09:58 -08:00
oobabooga 9b916f02cd ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed 2026-03-04 10:51:15 -08:00
oobabooga 5d93f4e800 Fix requires_grad warning in logits API 2026-03-04 10:43:23 -08:00
oobabooga 64eb77e782 Fix the logits API endpoint with transformers 2026-03-04 10:41:47 -08:00
oobabooga 65de4c30c8 Add adaptive-p sampler and n-gram speculative decoding support 2026-03-04 09:41:29 -08:00
oobabooga f010aa1612 Replace PyPDF2 with pymupdf for PDF text extraction
pymupdf produces cleaner text (e.g. no concatenated words in headers),
handles encrypted and malformed PDFs that PyPDF2 failed on, and
supports non-Latin scripts.
2026-03-04 06:43:37 -08:00
oobabooga f4d787ab8d Delegate GPU layer allocation to llama.cpp's --fit 2026-03-04 06:37:50 -08:00
oobabooga 8a3d866401 Fix temperature_last having no effect in llama.cpp server sampler order 2026-03-04 06:10:51 -08:00
oobabooga b3fd0d16e0 Use a new gr.Headless component for efficient chat streaming 2026-03-03 18:12:03 -08:00
oobabooga 2260e530c9 Remove gradio monkey-patches (moved to gradio fork) 2026-03-03 17:17:36 -08:00
oobabooga c54e8a2b3d Try to spawn llama.cpp on port 5001 instead of random port 2026-01-28 08:23:55 -08:00
oobabooga dc2bbf1861 Refactor thinking block detection and add Solar Open support 2026-01-28 08:21:34 -08:00
q5sys (JT) 7493fe7841
feat: Add a dropdown to save/load user personas (#7367) 2026-01-14 20:35:08 -03:00
Sergey 'Jin' Bostandzhyan 6e2c4e9c23
Fix loading models which have their eos token disabled (#7363) 2026-01-06 11:31:10 -03:00
oobabooga e7c8b51fec Revert "Use flash_attention_2 by default for Transformers models"
This reverts commit 85f2df92e9.
2025-12-07 18:48:41 -08:00
oobabooga b758059e95 Revert "Clear the torch cache between sequential image generations"
This reverts commit 1ec9f708e5.
2025-12-07 12:23:19 -08:00
oobabooga 1ec9f708e5 Clear the torch cache between sequential image generations 2025-12-07 11:49:22 -08:00
oobabooga 85f2df92e9 Use flash_attention_2 by default for Transformers models 2025-12-07 06:56:58 -08:00
oobabooga 1762312fb4 Use random instead of np.random for image seeds (makes it work on Windows) 2025-12-06 20:10:32 -08:00
oobabooga 02518a96a9 Lint 2025-12-06 06:55:06 -08:00
oobabooga 455dc06db0 Serve the original PNG images in the UI instead of webp 2025-12-06 05:43:00 -08:00
oobabooga 6ca99910ba Image: Quantize the text encoder for lower VRAM 2025-12-05 13:08:46 -08:00
oobabooga 11937de517 Use flash attention for image generation by default 2025-12-05 12:13:24 -08:00
oobabooga c11c14590a Image: Better LLM variation default prompt 2025-12-05 08:08:11 -08:00
oobabooga 0dd468245c Image: Add back the gallery cache (for performance) 2025-12-05 07:11:38 -08:00
oobabooga b63d57158d Image: Add TGW as a prefix to output images 2025-12-05 05:59:54 -08:00
oobabooga afa29b9554 Image: Several fixes 2025-12-05 05:58:57 -08:00
oobabooga 8eac99599a Image: Better LLM variation default prompt 2025-12-04 19:58:06 -08:00
oobabooga b4f06a50b0 fix: Pass bos_token and eos_token from metadata to jinja2
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga 56f2a9512f Revert "Image: Add the LLM-generated prompt to the API result"
This reverts commit c7ad28a4cd.
2025-12-04 17:34:27 -08:00
oobabooga c7ad28a4cd Image: Add the LLM-generated prompt to the API result 2025-12-04 17:22:08 -08:00
oobabooga b451bac082 Image: Improve a log message 2025-12-04 16:33:46 -08:00
oobabooga 47a0fcd614 Image: PNG metadata improvements 2025-12-04 16:25:48 -08:00
oobabooga ac31a7c008 Image: Organize the UI 2025-12-04 15:45:04 -08:00
oobabooga a90739f498 Image: Better LLM variation default prompt 2025-12-04 10:50:40 -08:00
oobabooga ffef3c7b1d Image: Make the LLM Variations prompt configurable 2025-12-04 10:44:35 -08:00
oobabooga 5763947c37 Image: Simplify the API code, add the llm_variations option 2025-12-04 10:23:00 -08:00
oobabooga 2793153717 Image: Add LLM-generated prompt variations 2025-12-04 08:10:24 -08:00
oobabooga 7fb9f19bd8 Progress bar style improvements 2025-12-04 06:20:45 -08:00
oobabooga a838223d18 Image: Add a progress bar during generation 2025-12-04 05:49:57 -08:00
oobabooga 14dbc3488e Image: Clear the torch cache after generation, not before 2025-12-04 05:32:58 -08:00
oobabooga c357eed4c7 Image: Remove the flash_attention_3 option (no idea how to get it working) 2025-12-03 18:40:34 -08:00
oobabooga fbca54957e Image generation: Yield partial results for batch count > 1 2025-12-03 16:13:07 -08:00
oobabooga 49c60882bf Image generation: Safer image uploading 2025-12-03 16:07:51 -08:00
oobabooga 59285d501d Image generation: Small UI improvements 2025-12-03 16:03:31 -08:00
oobabooga 373baa5c9c UI: Minor image gallery improvements 2025-12-03 14:45:02 -08:00
oobabooga 9448bf1caa Image generation: add torchao quantization (supports torch.compile) 2025-12-02 14:22:51 -08:00
oobabooga 97281ff831 UI: Fix an index error in the new image gallery 2025-12-02 11:20:52 -08:00
oobabooga 9d07d3a229 Make portable builds functional again after b3666e140d 2025-12-02 10:06:57 -08:00
oobabooga 6291e72129 Remove quanto for now (requires messy compilation) 2025-12-02 09:57:18 -08:00
oobabooga b3666e140d
Add image generation support (#7328) 2025-12-02 14:55:38 -03:00
oobabooga 5327bc9397
Update modules/shared.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:48:05 -03:00
GodEmperor785 400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316) 2025-11-21 16:56:02 -03:00
oobabooga 8f0048663d More modular HTML generator 2025-11-21 07:09:16 -08:00
oobabooga 0d4eff284c Add a --cpu-moe model for llama.cpp 2025-11-19 05:23:43 -08:00
Trenten Miller 6871484398
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training 2025-10-28 16:48:04 -03:00
oobabooga a156ebbf76 Lint 2025-10-15 13:15:01 -07:00
oobabooga c871d9cdbd Revert "Same as 7f06aec3a1 but for exllamav3_hf"
This reverts commit deb37b821b.
2025-10-15 13:05:41 -07:00
oobabooga b5a6904c4a Make --trust-remote-code immutable from the UI/API 2025-10-14 20:47:01 -07:00
mamei16 308e726e11
log error when llama-server request exceeds context size (#7263) 2025-10-12 23:00:11 -03:00
oobabooga 655c3e86e3 Fix "continue" missing an initial space in chat-instruct/chat modes 2025-10-11 17:00:25 -07:00
oobabooga c7dd920dc8 Fix metadata leaking into branched chats 2025-10-11 14:12:05 -07:00
oobabooga 78ff21d512 Organize the --help message 2025-10-10 15:21:08 -07:00
oobabooga 0d03813e98
Update modules/chat.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 21:01:13 -03:00
oobabooga deb37b821b Same as 7f06aec3a1 but for exllamav3_hf 2025-10-09 13:02:38 -07:00
oobabooga 7f06aec3a1 exllamav3: Implement the logits function for /v1/internal/logits 2025-10-09 11:24:25 -07:00
oobabooga 218dc01b51 Add fallbacks after 93aa7b3ed3 2025-10-09 10:59:34 -07:00
oobabooga 282aa19189 Safer profile picture uploading 2025-10-09 09:26:35 -07:00
oobabooga 93aa7b3ed3 Better handle multigpu setups with transformers + bitsandbytes 2025-10-09 08:49:44 -07:00
Remowylliams 38a7fd685d
chat.py fixes Instruct mode History 2025-10-05 11:34:47 -03:00
oobabooga 1e863a7113 Fix exllamav3 ignoring the stop button 2025-09-19 16:12:50 -07:00
stevenxdavis dd6d2223a5
Changing transformers_loader.py to Match User Expectations for --bf16 and Flash Attention 2 (#7217) 2025-09-17 16:39:04 -03:00
oobabooga 9e9ab39892 Make exllamav3_hf and exllamav2_hf functional again 2025-09-17 12:29:22 -07:00
oobabooga f3829b268a llama.cpp: Always pass --flash-attn on 2025-09-02 12:12:17 -07:00
oobabooga c6ea67bbdb Lint 2025-09-02 10:22:03 -07:00
oobabooga 00ed878b05 Slightly more robust model loading 2025-09-02 10:16:26 -07:00
oobabooga 387e249dec Change an info message 2025-08-31 16:27:10 -07:00
oobabooga 8028d88541 Lint 2025-08-30 21:29:20 -07:00
oobabooga 13876a1ee8 llama.cpp: Remove the --flash-attn flag (it's always on now) 2025-08-30 20:28:26 -07:00
oobabooga 3a3e247f3c Even better way to handle continue for thinking blocks 2025-08-30 12:36:35 -07:00
oobabooga cf1aad2a68 Fix "continue" for Byte-OSS for partial thinking blocks 2025-08-30 12:16:45 -07:00
oobabooga 96136ea760 Fix LaTeX rendering for equations with asterisks 2025-08-30 10:13:32 -07:00
oobabooga a3eb67e466 Fix the UI failing to launch if the Notebook prompt is too long 2025-08-30 08:42:26 -07:00
oobabooga a2b37adb26 UI: Preload the correct fonts for chat mode 2025-08-29 09:25:44 -07:00
oobabooga cb8780a4ce Safer check for is_multimodal when loading models
Avoids unrelated multimodal error when a model fails to load due
to lack of memory.
2025-08-28 11:13:19 -07:00
oobabooga cfc83745ec UI: Improve right sidebar borders in light mode 2025-08-28 08:34:48 -07:00
oobabooga ba6041251d UI: Minor change 2025-08-28 06:20:00 -07:00
oobabooga a92758a144 llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS 2025-08-27 16:15:40 -07:00
oobabooga 030ba7bfeb UI: Mention that Seed-OSS uses enable_thinking 2025-08-27 07:44:35 -07:00
oobabooga 0b4518e61c "Text generation web UI" -> "Text Generation Web UI" 2025-08-27 05:53:09 -07:00
oobabooga 02ca96fa44 Multiple fixes 2025-08-25 22:17:22 -07:00
oobabooga 6a7166fffa Add support for the Seed-OSS template 2025-08-25 19:46:48 -07:00
oobabooga 8fcb4b3102 Make bot_prefix extensions functional again 2025-08-25 19:10:46 -07:00
oobabooga 8f660aefe3 Fix chat-instruct replies leaking the bot name sometimes 2025-08-25 18:50:16 -07:00
oobabooga a531328f7e Fix the GPT-OSS stopping string 2025-08-25 18:41:58 -07:00
oobabooga 6c165d2e55 Fix the chat template 2025-08-25 18:28:43 -07:00
oobabooga b657be7381 Obtain stopping strings in chat mode 2025-08-25 18:22:08 -07:00
oobabooga ded6c41cf8 Fix impersonate for chat-instruct 2025-08-25 18:16:17 -07:00
oobabooga c1aa4590ea Code simplifications, fix impersonate 2025-08-25 18:05:40 -07:00