Several small fixes

- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
This commit is contained in:
oobabooga 2026-03-06 16:52:02 -03:00
parent 044566d42d
commit d03923924a
4 changed files with 16 additions and 4 deletions

View file

@ -126,6 +126,8 @@ def unload_model(keep_model_name=False):
if model_class_name in ['Exllamav3Model', 'Exllamav3HF', 'TensorRTLLMModel']:
shared.model.unload()
elif model_class_name == 'LlamaServer':
shared.model.stop()
shared.model = shared.tokenizer = None
shared.lora_names = []