Commit graph

192 commits

Author SHA1 Message Date
oobabooga 7301452b41 UI: Minor info message change 2025-08-12 13:23:24 -07:00
oobabooga d86b0ec010
Add multimodal support (llama.cpp) (#7027) 2025-08-10 01:27:25 -03:00
oobabooga 0c667de7a7 UI: Add a None option for the speculative decoding model (closes #7145) 2025-07-19 12:14:41 -07:00
oobabooga 1d1b20bd77 Remove the --torch-compile option (it doesn't do anything currently) 2025-07-11 10:51:23 -07:00
oobabooga 273888f218 Revert "Use eager attention by default instead of sdpa"
This reverts commit bd4881c4dc.
2025-07-10 18:56:46 -07:00
oobabooga bd4881c4dc Use eager attention by default instead of sdpa 2025-07-09 19:57:37 -07:00
oobabooga 6c2bdda0f0 Transformers loader: replace use_flash_attention_2/use_eager_attention with a unified attn_implementation
Closes #7107
2025-07-09 18:39:37 -07:00
Alidr79 e5767d4fc5
Update ui_model_menu.py blocking the --multi-user access in backend (#7098) 2025-07-06 21:48:53 -03:00
oobabooga acd57b6a85 Minor UI change 2025-06-19 15:39:43 -07:00
oobabooga f08db63fbc Change some comments 2025-06-19 15:26:45 -07:00
oobabooga 9c6913ad61 Show file sizes on "Get file list" 2025-06-18 21:35:07 -07:00
Miriam f4f621b215
ensure estimated vram is updated when switching between different models (#7071) 2025-06-13 02:56:33 -03:00
oobabooga f337767f36 Add error handling for non-llama.cpp models in portable mode 2025-06-12 22:17:39 -07:00
oobabooga 889153952f Lint 2025-06-10 09:02:52 -07:00
oobabooga 92adceb7b5 UI: Fix the model downloader progress bar 2025-06-01 19:22:21 -07:00
oobabooga 5d00574a56 Minor UI fixes 2025-05-20 16:20:49 -07:00
oobabooga 9ec46b8c44 Remove the HQQ loader (HQQ models can be loaded through Transformers) 2025-05-19 09:23:24 -07:00
oobabooga 2faaf18f1f Add back the "Common values" to the ctx-size slider 2025-05-18 09:06:20 -07:00
oobabooga 1c549d176b Fix GPU layers slider: honor saved settings and show true maximum 2025-05-16 17:26:13 -07:00
oobabooga adb975a380 Prevent fractional gpu-layers in the UI 2025-05-16 12:52:43 -07:00
oobabooga fc483650b5 Set the maximum gpu_layers value automatically when the model is loaded with --model 2025-05-16 11:58:17 -07:00
oobabooga 9ec9b1bf83 Auto-adjust GPU layers after model unload to utilize freed VRAM 2025-05-16 09:56:23 -07:00
oobabooga 4925c307cf Auto-adjust GPU layers on context size and cache type changes + many fixes 2025-05-16 09:07:38 -07:00
oobabooga cbf4daf1c8 Hide the LoRA menu in portable mode 2025-05-15 21:21:54 -07:00
oobabooga 5534d01da0
Estimate the VRAM for GGUF models + autoset gpu-layers (#6980) 2025-05-16 00:07:37 -03:00
oobabooga c4a715fd1e UI: Move the LoRA menu under "Other options" 2025-05-13 20:14:09 -07:00
oobabooga 3fa1a899ae UI: Fix gpu-layers being ignored (closes #6973) 2025-05-13 12:07:59 -07:00
oobabooga 512bc2d0e0 UI: Update some labels 2025-05-08 23:43:55 -07:00
oobabooga f8ef6e09af UI: Make ctx-size a slider 2025-05-08 18:19:04 -07:00
oobabooga a2ab42d390 UI: Remove the exllamav2 info message 2025-05-08 08:00:38 -07:00
oobabooga 348d4860c2 UI: Create a "Main options" section in the Model tab 2025-05-08 07:58:59 -07:00
oobabooga d2bae7694c UI: Change the ctx-size description 2025-05-08 07:26:23 -07:00
oobabooga b817bb33fd Minor fix after df7bb0db1f 2025-05-05 05:00:20 -07:00
oobabooga df7bb0db1f Rename --n-gpu-layers to --gpu-layers 2025-05-04 20:03:55 -07:00
oobabooga ea60f14674 UI: Show the list of files if the user tries to download a GGUF repository 2025-05-03 06:06:50 -07:00
oobabooga 4cea720da8 UI: Remove the "Autoload the model" feature 2025-05-02 16:38:28 -07:00
oobabooga 905afced1c Add a --portable flag to hide things in portable mode 2025-05-02 16:34:29 -07:00
oobabooga 307d13b540 UI: Minor label change 2025-04-30 18:58:14 -07:00
oobabooga 15a29e99f8 Lint 2025-04-27 21:41:34 -07:00
oobabooga be13f5199b UI: Add an info message about how to use Speculative Decoding 2025-04-27 21:40:38 -07:00
oobabooga 7b80acd524 Fix parsing --extra-flags 2025-04-26 18:40:03 -07:00
oobabooga 4ff91b6588 Better default settings for Speculative Decoding 2025-04-26 17:24:40 -07:00
oobabooga d9de14d1f7
Restructure the repository (#6904) 2025-04-26 08:56:54 -03:00
oobabooga d4017fbb6d
ExLlamaV3: Add kv cache quantization (#6903) 2025-04-25 21:32:00 -03:00
oobabooga d4b1e31c49 Use --ctx-size to specify the context size for all loaders
Old flags are still recognized as alternatives.
2025-04-25 16:59:03 -07:00
oobabooga 877cf44c08 llama.cpp: Add StreamingLLM (--streaming-llm) 2025-04-25 16:21:41 -07:00
oobabooga 98f4c694b9 llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server 2025-04-25 07:32:51 -07:00
oobabooga 93fd4ad25d llama.cpp: Document the --device-draft syntax 2025-04-24 09:20:11 -07:00
oobabooga e99c20bcb0
llama.cpp: Add speculative decoding (#6891) 2025-04-23 20:10:16 -03:00
oobabooga ae02ffc605
Refactor the transformers loader (#6859) 2025-04-20 13:33:47 -03:00