oobabooga
|
0c667de7a7
|
UI: Add a None option for the speculative decoding model (closes #7145)
|
2025-07-19 12:14:41 -07:00 |
|
oobabooga
|
1d1b20bd77
|
Remove the --torch-compile option (it doesn't do anything currently)
|
2025-07-11 10:51:23 -07:00 |
|
oobabooga
|
273888f218
|
Revert "Use eager attention by default instead of sdpa"
This reverts commit bd4881c4dc.
|
2025-07-10 18:56:46 -07:00 |
|
oobabooga
|
bd4881c4dc
|
Use eager attention by default instead of sdpa
|
2025-07-09 19:57:37 -07:00 |
|
oobabooga
|
6c2bdda0f0
|
Transformers loader: replace use_flash_attention_2/use_eager_attention with a unified attn_implementation
Closes #7107
|
2025-07-09 18:39:37 -07:00 |
|
Alidr79
|
e5767d4fc5
|
Update ui_model_menu.py blocking the --multi-user access in backend (#7098)
|
2025-07-06 21:48:53 -03:00 |
|
oobabooga
|
acd57b6a85
|
Minor UI change
|
2025-06-19 15:39:43 -07:00 |
|
oobabooga
|
f08db63fbc
|
Change some comments
|
2025-06-19 15:26:45 -07:00 |
|
oobabooga
|
9c6913ad61
|
Show file sizes on "Get file list"
|
2025-06-18 21:35:07 -07:00 |
|
Miriam
|
f4f621b215
|
ensure estimated vram is updated when switching between different models (#7071)
|
2025-06-13 02:56:33 -03:00 |
|
oobabooga
|
f337767f36
|
Add error handling for non-llama.cpp models in portable mode
|
2025-06-12 22:17:39 -07:00 |
|
oobabooga
|
889153952f
|
Lint
|
2025-06-10 09:02:52 -07:00 |
|
oobabooga
|
92adceb7b5
|
UI: Fix the model downloader progress bar
|
2025-06-01 19:22:21 -07:00 |
|
oobabooga
|
5d00574a56
|
Minor UI fixes
|
2025-05-20 16:20:49 -07:00 |
|
oobabooga
|
9ec46b8c44
|
Remove the HQQ loader (HQQ models can be loaded through Transformers)
|
2025-05-19 09:23:24 -07:00 |
|
oobabooga
|
2faaf18f1f
|
Add back the "Common values" to the ctx-size slider
|
2025-05-18 09:06:20 -07:00 |
|
oobabooga
|
1c549d176b
|
Fix GPU layers slider: honor saved settings and show true maximum
|
2025-05-16 17:26:13 -07:00 |
|
oobabooga
|
adb975a380
|
Prevent fractional gpu-layers in the UI
|
2025-05-16 12:52:43 -07:00 |
|
oobabooga
|
fc483650b5
|
Set the maximum gpu_layers value automatically when the model is loaded with --model
|
2025-05-16 11:58:17 -07:00 |
|
oobabooga
|
9ec9b1bf83
|
Auto-adjust GPU layers after model unload to utilize freed VRAM
|
2025-05-16 09:56:23 -07:00 |
|
oobabooga
|
4925c307cf
|
Auto-adjust GPU layers on context size and cache type changes + many fixes
|
2025-05-16 09:07:38 -07:00 |
|
oobabooga
|
cbf4daf1c8
|
Hide the LoRA menu in portable mode
|
2025-05-15 21:21:54 -07:00 |
|
oobabooga
|
5534d01da0
|
Estimate the VRAM for GGUF models + autoset gpu-layers (#6980)
|
2025-05-16 00:07:37 -03:00 |
|
oobabooga
|
c4a715fd1e
|
UI: Move the LoRA menu under "Other options"
|
2025-05-13 20:14:09 -07:00 |
|
oobabooga
|
3fa1a899ae
|
UI: Fix gpu-layers being ignored (closes #6973)
|
2025-05-13 12:07:59 -07:00 |
|
oobabooga
|
512bc2d0e0
|
UI: Update some labels
|
2025-05-08 23:43:55 -07:00 |
|
oobabooga
|
f8ef6e09af
|
UI: Make ctx-size a slider
|
2025-05-08 18:19:04 -07:00 |
|
oobabooga
|
a2ab42d390
|
UI: Remove the exllamav2 info message
|
2025-05-08 08:00:38 -07:00 |
|
oobabooga
|
348d4860c2
|
UI: Create a "Main options" section in the Model tab
|
2025-05-08 07:58:59 -07:00 |
|
oobabooga
|
d2bae7694c
|
UI: Change the ctx-size description
|
2025-05-08 07:26:23 -07:00 |
|
oobabooga
|
b817bb33fd
|
Minor fix after df7bb0db1f
|
2025-05-05 05:00:20 -07:00 |
|
oobabooga
|
df7bb0db1f
|
Rename --n-gpu-layers to --gpu-layers
|
2025-05-04 20:03:55 -07:00 |
|
oobabooga
|
ea60f14674
|
UI: Show the list of files if the user tries to download a GGUF repository
|
2025-05-03 06:06:50 -07:00 |
|
oobabooga
|
4cea720da8
|
UI: Remove the "Autoload the model" feature
|
2025-05-02 16:38:28 -07:00 |
|
oobabooga
|
905afced1c
|
Add a --portable flag to hide things in portable mode
|
2025-05-02 16:34:29 -07:00 |
|
oobabooga
|
307d13b540
|
UI: Minor label change
|
2025-04-30 18:58:14 -07:00 |
|
oobabooga
|
15a29e99f8
|
Lint
|
2025-04-27 21:41:34 -07:00 |
|
oobabooga
|
be13f5199b
|
UI: Add an info message about how to use Speculative Decoding
|
2025-04-27 21:40:38 -07:00 |
|
oobabooga
|
7b80acd524
|
Fix parsing --extra-flags
|
2025-04-26 18:40:03 -07:00 |
|
oobabooga
|
4ff91b6588
|
Better default settings for Speculative Decoding
|
2025-04-26 17:24:40 -07:00 |
|
oobabooga
|
d9de14d1f7
|
Restructure the repository (#6904)
|
2025-04-26 08:56:54 -03:00 |
|
oobabooga
|
d4017fbb6d
|
ExLlamaV3: Add kv cache quantization (#6903)
|
2025-04-25 21:32:00 -03:00 |
|
oobabooga
|
d4b1e31c49
|
Use --ctx-size to specify the context size for all loaders
Old flags are still recognized as alternatives.
|
2025-04-25 16:59:03 -07:00 |
|
oobabooga
|
877cf44c08
|
llama.cpp: Add StreamingLLM (--streaming-llm)
|
2025-04-25 16:21:41 -07:00 |
|
oobabooga
|
98f4c694b9
|
llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server
|
2025-04-25 07:32:51 -07:00 |
|
oobabooga
|
93fd4ad25d
|
llama.cpp: Document the --device-draft syntax
|
2025-04-24 09:20:11 -07:00 |
|
oobabooga
|
e99c20bcb0
|
llama.cpp: Add speculative decoding (#6891)
|
2025-04-23 20:10:16 -03:00 |
|
oobabooga
|
ae02ffc605
|
Refactor the transformers loader (#6859)
|
2025-04-20 13:33:47 -03:00 |
|
oobabooga
|
d68f0fbdf7
|
Remove obsolete references to llamacpp_HF
|
2025-04-18 07:46:04 -07:00 |
|
oobabooga
|
8144e1031e
|
Remove deprecated command-line flags
|
2025-04-18 06:02:28 -07:00 |
|