Several small fixes

- Stop llama-server subprocess on model unload instead of relying on GC - Fix tool_calls[].index being string instead of int in API responses - Omit tool_calls key from API response when empty per OpenAI spec - Prevent division by zero when micro_batch_size > batch_size in training - Copy sampler_priority list before mutating in ExLlamaV3 - Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting - Restore original chat_template after training instead of leaving it mutated
2026-04-20 22:13:43 +00:00 · 2026-03-06 16:52:02 -03:00 · 2026-03-06 16:52:02 -03:00 · d03923924a
commit d03923924a
parent 044566d42d
4 changed files with 16 additions and 4 deletions
--- a/modules/models.py
+++ b/modules/models.py
@ -126,6 +126,8 @@ def unload_model(keep_model_name=False):

    if model_class_name in ['Exllamav3Model', 'Exllamav3HF', 'TensorRTLLMModel']:
        shared.model.unload()
+    elif model_class_name == 'LlamaServer':
+        shared.model.stop()

    shared.model = shared.tokenizer = None
    shared.lora_names = []