llama.cpp: Use --fit-ctx 8192 when --fit on is used

This sets the minimum acceptable context length, which by default is 4096.
This commit is contained in:
oobabooga 2026-03-15 09:22:38 -07:00
parent 5763cab3c4
commit 9119ce0680

View file

@ -378,6 +378,7 @@ class LlamaServer:
cmd += ["--gpu-layers", str(shared.args.gpu_layers), "--fit", "off"]
else:
cmd += ["--fit", "on"]
cmd += ["--fit-ctx", "8192"]
if shared.args.fit_target:
cmd += ["--fit-target", shared.args.fit_target]