UI: Set max_updates_second to 12 by default

When the tokens/second at at ~50 and the model is a thinking model,
the markdown rendering for the streaming message becomes a CPU
bottleneck.
This commit is contained in:
oobabooga 2025-04-30 14:53:15 -07:00
parent a4bf339724
commit b46ca01340

View file

@ -47,7 +47,7 @@ settings = {
'max_new_tokens_max': 4096,
'prompt_lookup_num_tokens': 0,
'max_tokens_second': 0,
'max_updates_second': 0,
'max_updates_second': 12,
'auto_max_new_tokens': True,
'ban_eos_token': False,
'add_bos_token': True,