Commit graph

2000 commits

Author SHA1 Message Date
oobabooga 164c6fcdbf Add the UI structure 2025-11-27 13:44:07 -08:00
oobabooga 4ad2ad468e Add basic structure 2025-11-27 10:10:11 -08:00
GodEmperor785 400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316) 2025-11-21 16:56:02 -03:00
oobabooga 8f0048663d More modular HTML generator 2025-11-21 07:09:16 -08:00
oobabooga 0d4eff284c Add a --cpu-moe model for llama.cpp 2025-11-19 05:23:43 -08:00
Trenten Miller 6871484398
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training 2025-10-28 16:48:04 -03:00
oobabooga a156ebbf76 Lint 2025-10-15 13:15:01 -07:00
oobabooga c871d9cdbd Revert "Same as 7f06aec3a1 but for exllamav3_hf"
This reverts commit deb37b821b.
2025-10-15 13:05:41 -07:00
oobabooga b5a6904c4a Make --trust-remote-code immutable from the UI/API 2025-10-14 20:47:01 -07:00
mamei16 308e726e11
log error when llama-server request exceeds context size (#7263) 2025-10-12 23:00:11 -03:00
oobabooga 655c3e86e3 Fix "continue" missing an initial space in chat-instruct/chat modes 2025-10-11 17:00:25 -07:00
oobabooga c7dd920dc8 Fix metadata leaking into branched chats 2025-10-11 14:12:05 -07:00
oobabooga 78ff21d512 Organize the --help message 2025-10-10 15:21:08 -07:00
oobabooga 0d03813e98
Update modules/chat.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 21:01:13 -03:00
oobabooga deb37b821b Same as 7f06aec3a1 but for exllamav3_hf 2025-10-09 13:02:38 -07:00
oobabooga 7f06aec3a1 exllamav3: Implement the logits function for /v1/internal/logits 2025-10-09 11:24:25 -07:00
oobabooga 218dc01b51 Add fallbacks after 93aa7b3ed3 2025-10-09 10:59:34 -07:00
oobabooga 282aa19189 Safer profile picture uploading 2025-10-09 09:26:35 -07:00
oobabooga 93aa7b3ed3 Better handle multigpu setups with transformers + bitsandbytes 2025-10-09 08:49:44 -07:00
Remowylliams 38a7fd685d
chat.py fixes Instruct mode History 2025-10-05 11:34:47 -03:00
oobabooga 1e863a7113 Fix exllamav3 ignoring the stop button 2025-09-19 16:12:50 -07:00
stevenxdavis dd6d2223a5
Changing transformers_loader.py to Match User Expectations for --bf16 and Flash Attention 2 (#7217) 2025-09-17 16:39:04 -03:00
oobabooga 9e9ab39892 Make exllamav3_hf and exllamav2_hf functional again 2025-09-17 12:29:22 -07:00
oobabooga f3829b268a llama.cpp: Always pass --flash-attn on 2025-09-02 12:12:17 -07:00
oobabooga c6ea67bbdb Lint 2025-09-02 10:22:03 -07:00
oobabooga 00ed878b05 Slightly more robust model loading 2025-09-02 10:16:26 -07:00
oobabooga 387e249dec Change an info message 2025-08-31 16:27:10 -07:00
oobabooga 8028d88541 Lint 2025-08-30 21:29:20 -07:00
oobabooga 13876a1ee8 llama.cpp: Remove the --flash-attn flag (it's always on now) 2025-08-30 20:28:26 -07:00
oobabooga 3a3e247f3c Even better way to handle continue for thinking blocks 2025-08-30 12:36:35 -07:00
oobabooga cf1aad2a68 Fix "continue" for Byte-OSS for partial thinking blocks 2025-08-30 12:16:45 -07:00
oobabooga 96136ea760 Fix LaTeX rendering for equations with asterisks 2025-08-30 10:13:32 -07:00
oobabooga a3eb67e466 Fix the UI failing to launch if the Notebook prompt is too long 2025-08-30 08:42:26 -07:00
oobabooga a2b37adb26 UI: Preload the correct fonts for chat mode 2025-08-29 09:25:44 -07:00
oobabooga cb8780a4ce Safer check for is_multimodal when loading models
Avoids unrelated multimodal error when a model fails to load due
to lack of memory.
2025-08-28 11:13:19 -07:00
oobabooga cfc83745ec UI: Improve right sidebar borders in light mode 2025-08-28 08:34:48 -07:00
oobabooga ba6041251d UI: Minor change 2025-08-28 06:20:00 -07:00
oobabooga a92758a144 llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS 2025-08-27 16:15:40 -07:00
oobabooga 030ba7bfeb UI: Mention that Seed-OSS uses enable_thinking 2025-08-27 07:44:35 -07:00
oobabooga 0b4518e61c "Text generation web UI" -> "Text Generation Web UI" 2025-08-27 05:53:09 -07:00
oobabooga 02ca96fa44 Multiple fixes 2025-08-25 22:17:22 -07:00
oobabooga 6a7166fffa Add support for the Seed-OSS template 2025-08-25 19:46:48 -07:00
oobabooga 8fcb4b3102 Make bot_prefix extensions functional again 2025-08-25 19:10:46 -07:00
oobabooga 8f660aefe3 Fix chat-instruct replies leaking the bot name sometimes 2025-08-25 18:50:16 -07:00
oobabooga a531328f7e Fix the GPT-OSS stopping string 2025-08-25 18:41:58 -07:00
oobabooga 6c165d2e55 Fix the chat template 2025-08-25 18:28:43 -07:00
oobabooga b657be7381 Obtain stopping strings in chat mode 2025-08-25 18:22:08 -07:00
oobabooga ded6c41cf8 Fix impersonate for chat-instruct 2025-08-25 18:16:17 -07:00
oobabooga c1aa4590ea Code simplifications, fix impersonate 2025-08-25 18:05:40 -07:00
oobabooga b330ec3517 Simplifications 2025-08-25 17:54:15 -07:00
oobabooga 3ad5970374 Make the llama.cpp --verbose output less verbose 2025-08-25 17:43:21 -07:00
oobabooga adeca8a658 Remove changes to the jinja2 templates 2025-08-25 17:36:01 -07:00
oobabooga aad0104c1b Remove a function 2025-08-25 17:33:13 -07:00
oobabooga f919cdf881 chat.py code simplifications 2025-08-25 17:20:51 -07:00
oobabooga d08800c359 chat.py improvements 2025-08-25 17:03:37 -07:00
oobabooga 3bc48014a5 chat.py code simplifications 2025-08-25 16:48:21 -07:00
oobabooga 2478294c06 UI: Preload the instruct and chat fonts 2025-08-24 12:37:41 -07:00
oobabooga 8be798e15f llama.cpp: Fix stderr deadlock while loading some multimodal models 2025-08-24 12:20:05 -07:00
oobabooga 7fe8da8944 Minor simplification after f247c2ae62 2025-08-22 14:42:56 -07:00
oobabooga f247c2ae62 Make --model work with absolute paths, eg --model /tmp/gemma-3-270m-it-IQ4_NL.gguf 2025-08-22 11:47:33 -07:00
oobabooga 9e7b326e34 Lint 2025-08-19 06:50:40 -07:00
oobabooga 1972479610 Add the TP option to exllamav3_HF 2025-08-19 06:48:22 -07:00
oobabooga e0f5905a97 Code formatting 2025-08-19 06:34:05 -07:00
oobabooga 5b06284a8a UI: Keep ExLlamav3_HF selected if already selected for EXL3 models 2025-08-19 06:23:21 -07:00
oobabooga cbba58bef9 UI: Fix code blocks having an extra empty line 2025-08-18 15:50:09 -07:00
oobabooga 7d23a55901 Fix model unloading when switching loaders (closes #7203) 2025-08-18 09:05:47 -07:00
oobabooga 64eba9576c mtmd: Fix a bug when "include past attachments" is unchecked 2025-08-17 14:08:40 -07:00
oobabooga dbabe67e77 ExLlamaV3: Enable the --enable-tp option, add a --tp-backend option 2025-08-17 13:19:11 -07:00
oobabooga d771ca4a13 Fix web search (attempt) 2025-08-14 12:05:14 -07:00
altoiddealer 57f6e9af5a
Set multimodal status during Model Loading (#7199) 2025-08-13 16:47:27 -03:00
oobabooga 41b95e9ec3 Lint 2025-08-12 13:37:37 -07:00
oobabooga 7301452b41 UI: Minor info message change 2025-08-12 13:23:24 -07:00
oobabooga 8d7b88106a Revert "mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)"
This reverts commit d8fcc71616.
2025-08-12 13:20:16 -07:00
oobabooga 2238302b49 ExLlamaV3: Add speculative decoding 2025-08-12 08:50:45 -07:00
oobabooga d8fcc71616 mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp) 2025-08-11 18:02:33 -07:00
oobabooga e6447cd24a mtmd: Update the llama-server request 2025-08-11 17:42:35 -07:00
oobabooga 0e3def449a llama.cpp: --swa-full to llama-server when streaming-llm is checked 2025-08-11 15:17:25 -07:00
oobabooga 0e88a621fd UI: Better organize the right sidebar 2025-08-11 15:16:03 -07:00
oobabooga a78ca6ffcd Remove a comment 2025-08-11 12:33:38 -07:00
oobabooga 999471256c Lint 2025-08-11 12:32:17 -07:00
oobabooga b62c8845f3 mtmd: Fix /chat/completions for llama.cpp 2025-08-11 12:01:59 -07:00
oobabooga 38c0b4a1ad Default ctx-size to 8192 when not found in the metadata 2025-08-11 07:39:53 -07:00
oobabooga 52d1cbbbe9 Fix an import 2025-08-11 07:38:39 -07:00
oobabooga 4809ddfeb8 Exllamav3: small sampler fixes 2025-08-11 07:35:22 -07:00
oobabooga 4d8dbbab64 API: Fix sampler_priority usage for ExLlamaV3 2025-08-11 07:26:11 -07:00
oobabooga 0ea62d88f6 mtmd: Fix "continue" when an image is present 2025-08-09 21:47:02 -07:00
oobabooga 2f90ac9880 Move the new image_utils.py file to modules/ 2025-08-09 21:41:38 -07:00
oobabooga c6b4d1e87f Fix the exllamav2 loader ignoring add_bos 2025-08-09 21:34:35 -07:00
oobabooga d86b0ec010
Add multimodal support (llama.cpp) (#7027) 2025-08-10 01:27:25 -03:00
oobabooga a289a92b94 Fix exllamav3 token count 2025-08-09 17:10:58 -07:00
oobabooga d489eb589a Attempt at fixing new exllamav3 loader undefined behavior when switching conversations 2025-08-09 14:11:31 -07:00
oobabooga a6d6bee88c Change a comment 2025-08-09 07:51:03 -07:00
oobabooga 2fe79a93cc mtmd: Handle another case after 3f5ec9644f 2025-08-09 07:50:24 -07:00
oobabooga 59c6138e98 Remove a log message 2025-08-09 07:32:15 -07:00
oobabooga f396b82a4f mtmd: Better way to detect if an EXL3 model is multimodal 2025-08-09 07:31:36 -07:00
oobabooga fa9be444fa Use ExLlamav3 instead of ExLlamav3_HF by default for EXL3 models 2025-08-09 07:26:59 -07:00
oobabooga 3f5ec9644f mtmd: Place the image <__media__> at the top of the prompt 2025-08-09 07:06:07 -07:00
oobabooga 1168004067 Minor change 2025-08-09 07:01:55 -07:00
oobabooga 9e260332cc Remove some unnecessary code 2025-08-08 21:22:47 -07:00
oobabooga 544c3a7c9f Polish the new exllamav3 loader 2025-08-08 21:15:53 -07:00