Commit graph

59 commits

Author SHA1 Message Date
oobabooga 8be798e15f llama.cpp: Fix stderr deadlock while loading some multimodal models 2025-08-24 12:20:05 -07:00
oobabooga 7fe8da8944 Minor simplification after f247c2ae62 2025-08-22 14:42:56 -07:00
altoiddealer 57f6e9af5a
Set multimodal status during Model Loading (#7199) 2025-08-13 16:47:27 -03:00
oobabooga 8d7b88106a Revert "mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)"
This reverts commit d8fcc71616.
2025-08-12 13:20:16 -07:00
oobabooga d8fcc71616 mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp) 2025-08-11 18:02:33 -07:00
oobabooga e6447cd24a mtmd: Update the llama-server request 2025-08-11 17:42:35 -07:00
oobabooga 0e3def449a llama.cpp: --swa-full to llama-server when streaming-llm is checked 2025-08-11 15:17:25 -07:00
oobabooga b62c8845f3 mtmd: Fix /chat/completions for llama.cpp 2025-08-11 12:01:59 -07:00
oobabooga 2f90ac9880 Move the new image_utils.py file to modules/ 2025-08-09 21:41:38 -07:00
oobabooga d86b0ec010
Add multimodal support (llama.cpp) (#7027) 2025-08-10 01:27:25 -03:00
oobabooga 609c3ac893 Optimize the end of generation with llama.cpp 2025-06-15 08:03:27 -07:00
oobabooga 18bd78f1f0 Make the llama.cpp prompt processing messages shorter 2025-06-10 14:03:25 -07:00
oobabooga f8f23b5489 Simplify the llama.cpp stderr filter code 2025-06-06 22:25:13 -07:00
oobabooga 45f823ddf6 Print \n after the llama.cpp progress bar reaches 1.0 2025-06-06 22:23:34 -07:00
oobabooga 2db7745cbd Show llama.cpp prompt processing on one line instead of many lines 2025-06-01 22:12:24 -07:00
oobabooga e4d3f4449d API: Fix a regression 2025-05-16 13:02:27 -07:00
oobabooga 5534d01da0
Estimate the VRAM for GGUF models + autoset gpu-layers (#6980) 2025-05-16 00:07:37 -03:00
oobabooga 62c774bf24 Revert "New attempt"
This reverts commit e7ac06c169.
2025-05-13 06:42:25 -07:00
oobabooga e7ac06c169 New attempt 2025-05-10 19:20:04 -07:00
oobabooga 9ea2a69210 llama.cpp: Add --no-webui to the llama-server command 2025-05-08 10:41:25 -07:00
oobabooga c4f36db0d8 llama.cpp: remove tfs (it doesn't get used) 2025-05-06 08:41:13 -07:00
oobabooga d1c0154d66 llama.cpp: Add top_n_sigma, fix typical_p in sampler priority 2025-05-06 06:38:39 -07:00
oobabooga b817bb33fd Minor fix after df7bb0db1f 2025-05-05 05:00:20 -07:00
oobabooga b7a5c7db8d llama.cpp: Handle short arguments in --extra-flags 2025-05-04 07:14:42 -07:00
oobabooga 4c2e3b168b llama.cpp: Add a retry mechanism when getting the logits (sometimes it fails) 2025-05-03 06:51:20 -07:00
oobabooga b950a0c6db Lint 2025-04-30 20:02:10 -07:00
oobabooga a6c3ec2299 llama.cpp: Explicitly send cache_prompt = True 2025-04-30 15:24:07 -07:00
oobabooga 1ee0acc852 llama.cpp: Make --verbose print the llama-server command 2025-04-28 15:56:25 -07:00
oobabooga c6c2855c80 llama.cpp: Remove the timeout while loading models (closes #6907) 2025-04-27 21:22:21 -07:00
oobabooga 7b80acd524 Fix parsing --extra-flags 2025-04-26 18:40:03 -07:00
oobabooga 234aba1c50 llama.cpp: Simplify the prompt processing progress indicator
The progress bar was unreliable
2025-04-26 17:33:47 -07:00
oobabooga d4b1e31c49 Use --ctx-size to specify the context size for all loaders
Old flags are still recognized as alternatives.
2025-04-25 16:59:03 -07:00
oobabooga faababc4ea llama.cpp: Add a prompt processing progress bar 2025-04-25 16:42:30 -07:00
oobabooga 877cf44c08 llama.cpp: Add StreamingLLM (--streaming-llm) 2025-04-25 16:21:41 -07:00
oobabooga 98f4c694b9 llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server 2025-04-25 07:32:51 -07:00
oobabooga e99c20bcb0
llama.cpp: Add speculative decoding (#6891) 2025-04-23 20:10:16 -03:00
Matthew Jenkins d3e7c655e5
Add support for llama-cpp builds from https://github.com/ggml-org/llama.cpp (#6862) 2025-04-20 23:06:24 -03:00
oobabooga 5ab069786b llama.cpp: add back the two encode calls (they are harmless now) 2025-04-19 17:38:36 -07:00
oobabooga b9da5c7e3a Use 127.0.0.1 instead of localhost for faster llama.cpp on Windows 2025-04-19 17:36:04 -07:00
oobabooga 9c9df2063f llama.cpp: fix unicode decoding (closes #6856) 2025-04-19 16:38:15 -07:00
oobabooga ba976d1390 llama.cpp: avoid two 'encode' calls 2025-04-19 16:35:01 -07:00
oobabooga ed42154c78 Revert "llama.cpp: close the connection immediately on 'Stop'"
This reverts commit 5fdebc554b.
2025-04-19 05:32:36 -07:00
oobabooga 5fdebc554b llama.cpp: close the connection immediately on 'Stop' 2025-04-19 04:59:24 -07:00
oobabooga 6589ebeca8 Revert "llama.cpp: new optimization attempt"
This reverts commit e2e73ed22f.
2025-04-18 21:16:21 -07:00
oobabooga e2e73ed22f llama.cpp: new optimization attempt 2025-04-18 21:05:08 -07:00
oobabooga e2e90af6cd llama.cpp: don't include --rope-freq-base in the launch command if null 2025-04-18 20:51:18 -07:00
oobabooga 9f07a1f5d7 llama.cpp: new attempt at optimizing the llama-server connection 2025-04-18 19:30:53 -07:00
oobabooga f727b4a2cc llama.cpp: close the connection properly when generation is cancelled 2025-04-18 19:01:39 -07:00
oobabooga b3342b8dd8 llama.cpp: optimize the llama-server connection 2025-04-18 18:46:36 -07:00
oobabooga 2002590536 Revert "Attempt at making the llama-server streaming more efficient."
This reverts commit 5ad080ff25.
2025-04-18 18:13:54 -07:00