oobabooga
d03923924a
Several small fixes
...
- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
2026-03-06 16:52:13 -03:00
oobabooga
9824c82cb6
API: Add parallel request support for llama.cpp and ExLlamaV3
2026-03-05 16:49:58 -08:00
oobabooga
2f08dce7b0
Remove ExLlamaV2 backend
...
- archived upstream: 7dc12af3a8
- replaced by ExLlamaV3, which has much better quantization accuracy
2026-03-05 14:02:13 -08:00
oobabooga
9b916f02cd
ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed
2026-03-04 10:51:15 -08:00
oobabooga
7f06aec3a1
exllamav3: Implement the logits function for /v1/internal/logits
2025-10-09 11:24:25 -07:00
oobabooga
1e863a7113
Fix exllamav3 ignoring the stop button
2025-09-19 16:12:50 -07:00
oobabooga
e0f5905a97
Code formatting
2025-08-19 06:34:05 -07:00
oobabooga
dbabe67e77
ExLlamaV3: Enable the --enable-tp option, add a --tp-backend option
2025-08-17 13:19:11 -07:00
altoiddealer
57f6e9af5a
Set multimodal status during Model Loading ( #7199 )
2025-08-13 16:47:27 -03:00
oobabooga
41b95e9ec3
Lint
2025-08-12 13:37:37 -07:00
oobabooga
2238302b49
ExLlamaV3: Add speculative decoding
2025-08-12 08:50:45 -07:00
oobabooga
999471256c
Lint
2025-08-11 12:32:17 -07:00
oobabooga
52d1cbbbe9
Fix an import
2025-08-11 07:38:39 -07:00
oobabooga
4809ddfeb8
Exllamav3: small sampler fixes
2025-08-11 07:35:22 -07:00
oobabooga
4d8dbbab64
API: Fix sampler_priority usage for ExLlamaV3
2025-08-11 07:26:11 -07:00
oobabooga
2f90ac9880
Move the new image_utils.py file to modules/
2025-08-09 21:41:38 -07:00
oobabooga
c6b4d1e87f
Fix the exllamav2 loader ignoring add_bos
2025-08-09 21:34:35 -07:00
oobabooga
a289a92b94
Fix exllamav3 token count
2025-08-09 17:10:58 -07:00
oobabooga
d489eb589a
Attempt at fixing new exllamav3 loader undefined behavior when switching conversations
2025-08-09 14:11:31 -07:00
oobabooga
59c6138e98
Remove a log message
2025-08-09 07:32:15 -07:00
oobabooga
f396b82a4f
mtmd: Better way to detect if an EXL3 model is multimodal
2025-08-09 07:31:36 -07:00
oobabooga
1168004067
Minor change
2025-08-09 07:01:55 -07:00
oobabooga
9e260332cc
Remove some unnecessary code
2025-08-08 21:22:47 -07:00
oobabooga
544c3a7c9f
Polish the new exllamav3 loader
2025-08-08 21:15:53 -07:00
Katehuuh
88127f46c1
Add multimodal support (ExLlamaV3) ( #7174 )
2025-08-08 23:31:16 -03:00