oobabooga
5fa709a3f4
llama.cpp server: use port+5 offset and suppress No parser definition detected logs
2026-03-06 18:52:34 -08:00
oobabooga
93ebfa2b7e
Fix llama-server output filter for new log format
2026-03-06 02:38:13 -03:00
oobabooga
66fb79fe15
llama.cpp: Add --fit-target param
2026-03-06 01:55:48 -03:00
oobabooga
e2548f69a9
Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data
...
If --user-data-dir is not set, auto-detect: use ../user_data when
./user_data doesn't exist, making it easy to share user data across
portable builds by placing it one folder up.
2026-03-05 19:31:10 -08:00
oobabooga
9824c82cb6
API: Add parallel request support for llama.cpp and ExLlamaV3
2026-03-05 16:49:58 -08:00
oobabooga
69fa4dd0b1
llama.cpp: allow ctx_size=0 for auto context via --fit
2026-03-04 19:33:20 -08:00
oobabooga
fbfcd59fe0
llama.cpp: Use -1 instead of 0 for auto gpu_layers
2026-03-04 19:21:45 -08:00
oobabooga
da3010c3ed
tiny improvements to llama_cpp_server.py
2026-03-04 15:54:37 -08:00
Sense_wang
7bf15ad933
fix: replace bare except clauses with except Exception ( #7400 )
2026-03-04 18:06:17 -03:00
oobabooga
cdf0e392e6
llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults
2026-03-04 12:05:08 -08:00
oobabooga
65de4c30c8
Add adaptive-p sampler and n-gram speculative decoding support
2026-03-04 09:41:29 -08:00
oobabooga
f4d787ab8d
Delegate GPU layer allocation to llama.cpp's --fit
2026-03-04 06:37:50 -08:00
oobabooga
8a3d866401
Fix temperature_last having no effect in llama.cpp server sampler order
2026-03-04 06:10:51 -08:00
oobabooga
c54e8a2b3d
Try to spawn llama.cpp on port 5001 instead of random port
2026-01-28 08:23:55 -08:00
GodEmperor785
400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance ( #7316 )
2025-11-21 16:56:02 -03:00
oobabooga
0d4eff284c
Add a --cpu-moe model for llama.cpp
2025-11-19 05:23:43 -08:00
mamei16
308e726e11
log error when llama-server request exceeds context size ( #7263 )
2025-10-12 23:00:11 -03:00
oobabooga
f3829b268a
llama.cpp: Always pass --flash-attn on
2025-09-02 12:12:17 -07:00
oobabooga
13876a1ee8
llama.cpp: Remove the --flash-attn flag (it's always on now)
2025-08-30 20:28:26 -07:00
oobabooga
3ad5970374
Make the llama.cpp --verbose output less verbose
2025-08-25 17:43:21 -07:00
oobabooga
8be798e15f
llama.cpp: Fix stderr deadlock while loading some multimodal models
2025-08-24 12:20:05 -07:00
oobabooga
7fe8da8944
Minor simplification after f247c2ae62
2025-08-22 14:42:56 -07:00
altoiddealer
57f6e9af5a
Set multimodal status during Model Loading ( #7199 )
2025-08-13 16:47:27 -03:00
oobabooga
8d7b88106a
Revert "mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)"
...
This reverts commit d8fcc71616 .
2025-08-12 13:20:16 -07:00
oobabooga
d8fcc71616
mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)
2025-08-11 18:02:33 -07:00
oobabooga
e6447cd24a
mtmd: Update the llama-server request
2025-08-11 17:42:35 -07:00
oobabooga
0e3def449a
llama.cpp: --swa-full to llama-server when streaming-llm is checked
2025-08-11 15:17:25 -07:00
oobabooga
b62c8845f3
mtmd: Fix /chat/completions for llama.cpp
2025-08-11 12:01:59 -07:00
oobabooga
2f90ac9880
Move the new image_utils.py file to modules/
2025-08-09 21:41:38 -07:00
oobabooga
d86b0ec010
Add multimodal support (llama.cpp) ( #7027 )
2025-08-10 01:27:25 -03:00
oobabooga
609c3ac893
Optimize the end of generation with llama.cpp
2025-06-15 08:03:27 -07:00
oobabooga
18bd78f1f0
Make the llama.cpp prompt processing messages shorter
2025-06-10 14:03:25 -07:00
oobabooga
f8f23b5489
Simplify the llama.cpp stderr filter code
2025-06-06 22:25:13 -07:00
oobabooga
45f823ddf6
Print \n after the llama.cpp progress bar reaches 1.0
2025-06-06 22:23:34 -07:00
oobabooga
2db7745cbd
Show llama.cpp prompt processing on one line instead of many lines
2025-06-01 22:12:24 -07:00
oobabooga
e4d3f4449d
API: Fix a regression
2025-05-16 13:02:27 -07:00
oobabooga
5534d01da0
Estimate the VRAM for GGUF models + autoset gpu-layers ( #6980 )
2025-05-16 00:07:37 -03:00
oobabooga
62c774bf24
Revert "New attempt"
...
This reverts commit e7ac06c169 .
2025-05-13 06:42:25 -07:00
oobabooga
e7ac06c169
New attempt
2025-05-10 19:20:04 -07:00
oobabooga
9ea2a69210
llama.cpp: Add --no-webui to the llama-server command
2025-05-08 10:41:25 -07:00
oobabooga
c4f36db0d8
llama.cpp: remove tfs (it doesn't get used)
2025-05-06 08:41:13 -07:00
oobabooga
d1c0154d66
llama.cpp: Add top_n_sigma, fix typical_p in sampler priority
2025-05-06 06:38:39 -07:00
oobabooga
b817bb33fd
Minor fix after df7bb0db1f
2025-05-05 05:00:20 -07:00
oobabooga
b7a5c7db8d
llama.cpp: Handle short arguments in --extra-flags
2025-05-04 07:14:42 -07:00
oobabooga
4c2e3b168b
llama.cpp: Add a retry mechanism when getting the logits (sometimes it fails)
2025-05-03 06:51:20 -07:00
oobabooga
b950a0c6db
Lint
2025-04-30 20:02:10 -07:00
oobabooga
a6c3ec2299
llama.cpp: Explicitly send cache_prompt = True
2025-04-30 15:24:07 -07:00
oobabooga
1ee0acc852
llama.cpp: Make --verbose print the llama-server command
2025-04-28 15:56:25 -07:00
oobabooga
c6c2855c80
llama.cpp: Remove the timeout while loading models ( closes #6907 )
2025-04-27 21:22:21 -07:00
oobabooga
7b80acd524
Fix parsing --extra-flags
2025-04-26 18:40:03 -07:00