oobabooga
|
468cb5cb87
|
Update accelerate
|
2026-04-02 10:59:28 -07:00 |
|
oobabooga
|
6a1f720c7b
|
Update transformers
|
2026-04-02 10:58:20 -07:00 |
|
oobabooga
|
8f8b57a029
|
Update exllamav3
|
2026-04-02 10:54:20 -07:00 |
|
oobabooga
|
c50e17bdbe
|
Add dedicated ik portable requirements files and remove macOS ik builds
|
2026-04-02 14:50:52 -03:00 |
|
oobabooga
|
ea1f8c71f2
|
API: Optimize prompt logprobs and refactor ExLlamav3 forward pass
|
2026-04-02 14:31:11 -03:00 |
|
oobabooga
|
c10c6e87ae
|
API: Add token ids to logprobs output
|
2026-04-02 07:17:27 -07:00 |
|
oobabooga
|
a32ce254f2
|
Don't pass torch_dtype to transformers, autodetect from model config
|
2026-04-02 00:44:14 -03:00 |
|
oobabooga
|
4073164be0
|
Fix ExLlamav3 OOM on prompt logprobs and qwen3_5_moe HF compat
|
2026-04-01 19:44:55 -07:00 |
|
oobabooga
|
328534b762
|
Update llama.cpp
|
2026-04-01 12:51:07 -07:00 |
|
oobabooga
|
71c1a52afe
|
API: Implement echo + logprobs for /v1/completions endpoint
|
2026-03-31 07:43:11 -07:00 |
|
oobabooga
|
6382fbef83
|
Several small code simplifications
|
2026-03-30 19:36:03 -07:00 |
|
oobabooga
|
0466b6e271
|
ik_llama.cpp: Auto-enable Hadamard KV cache rotation with quantized cache
|
2026-03-29 15:52:36 -07:00 |
|
oobabooga
|
be6fc0663a
|
Update the custom gradio wheels
|
2026-03-28 08:11:28 -07:00 |
|
oobabooga
|
4979e87e48
|
Add ik_llama.cpp support via ik_llama_cpp_binaries package
|
2026-03-28 12:09:00 -03:00 |
|
oobabooga
|
9dd04b86ce
|
Suppress EOS token at logit level for ExLlamav3 when ban_eos_token is set
|
2026-03-28 06:17:57 -07:00 |
|
oobabooga
|
bda95172bd
|
Fix stopping string detection for chromadb/context-1
|
2026-03-28 06:09:53 -07:00 |
|
oobabooga
|
4cbea02ed4
|
Add ik_llama.cpp support via --ik flag
|
2026-03-26 06:54:47 -07:00 |
|
oobabooga
|
e154140021
|
Rename "truncation length" to "context length" in logs
|
2026-03-25 07:21:02 -07:00 |
|
oobabooga
|
368f37335f
|
Fix --idle-timeout issues with encode/decode and parallel generation
|
2026-03-25 06:37:45 -07:00 |
|
oobabooga
|
d6f1485dd1
|
UI: Update the enable_thinking info message
|
2026-03-24 21:45:11 -07:00 |
|
oobabooga
|
807be11832
|
Remove obsolete models/config.yaml and related code
|
2026-03-24 18:48:50 -07:00 |
|
oobabooga
|
f48a2b79d0
|
UI: Minor polish
|
2026-03-24 11:45:33 -07:00 |
|
oobabooga
|
750502695c
|
Fix GPT-OSS tool-calling after 9ec20d97
|
2026-03-24 11:39:24 -07:00 |
|
oobabooga
|
5814e745be
|
UI: Minor polish
|
2026-03-24 11:14:22 -07:00 |
|
oobabooga
|
5b8da154b7
|
Update llama.cpp
|
2026-03-24 09:34:59 -07:00 |
|
oobabooga
|
c9d2240f50
|
Update README
|
2026-03-24 06:45:39 -07:00 |
|
oobabooga
|
a7ef430b38
|
Revert "llama.cpp: Don't suppress llama-server logs"
This reverts commit 9488df3e48.
|
2026-03-23 20:22:51 -07:00 |
|
oobabooga
|
286bbb685d
|
Revert "Follow-up to previous commit"
This reverts commit 1dda5e4711.
|
2026-03-23 20:22:46 -07:00 |
|
oobabooga
|
02f18a1d65
|
API: Add thinking block signature field, fix error codes, clean up logging
|
2026-03-23 07:06:38 -07:00 |
|
oobabooga
|
307d0c92be
|
UI polish
|
2026-03-23 06:35:14 -07:00 |
|
oobabooga
|
9ec20d9730
|
Strip thinking blocks before tool-call parsing
|
2026-03-22 19:19:14 -07:00 |
|
Phrosty1
|
bde496ea5d
|
Fix prompt corruption when continuing with context truncation (#7439)
|
2026-03-22 21:48:56 -03:00 |
|
oobabooga
|
1dda5e4711
|
Follow-up to previous commit
|
2026-03-21 20:58:45 -07:00 |
|
oobabooga
|
9488df3e48
|
llama.cpp: Don't suppress llama-server logs
|
2026-03-21 20:47:26 -07:00 |
|
oobabooga
|
2c4f364339
|
Update API docs to mention Anthropic support
|
2026-03-21 18:38:11 -07:00 |
|
oobabooga
|
f2c909725e
|
API: Use top_p=0.95 by default
|
2026-03-21 11:11:09 -07:00 |
|
oobabooga
|
0216893475
|
API: Add Anthropic-compatible /v1/messages endpoint
|
2026-03-20 20:38:55 -07:00 |
|
oobabooga
|
f0e3997f37
|
Add missing __init__.py to modules/grammar
|
2026-03-20 16:04:57 -03:00 |
|
oobabooga
|
7c79143a14
|
API: Fix _start_cloudflared raising after first attempt instead of exhausting retries
|
2026-03-20 15:03:49 -03:00 |
|
oobabooga
|
855141967c
|
API: Handle --extensions openai as alias for --api
|
2026-03-20 15:03:17 -03:00 |
|
oobabooga
|
1a910574c3
|
API: Fix debug_msg truthy check for OPENEDAI_DEBUG=0
|
2026-03-20 14:57:01 -03:00 |
|
oobabooga
|
bf6fbc019d
|
API: Move OpenAI-compatible API from extensions/openai to modules/api
|
2026-03-20 14:46:00 -03:00 |
|
oobabooga
|
2e4232e02b
|
Minor cleanup
|
2026-03-20 07:20:26 -07:00 |
|
oobabooga
|
843de8b8a8
|
Update exllamav3 to 0.0.26
|
2026-03-19 18:49:36 -07:00 |
|
oobabooga
|
b3eb0e313d
|
Reduce the size of portable builds by using stripped Python
|
2026-03-19 11:53:12 -07:00 |
|
oobabooga
|
b9922f71ba
|
Merge branch 'main' into dev
|
2026-03-19 08:05:01 -07:00 |
|
oobabooga
|
e0e20ab9e7
|
Minor cleanup across multiple modules
|
2026-03-19 08:02:23 -07:00 |
|
oobabooga
|
5453b9f30e
|
Remove ancient/obsolete instruction templates
|
2026-03-19 07:54:37 -07:00 |
|
oobabooga
|
dde1764763
|
Cleanup modules/chat.py
|
2026-03-18 21:12:14 -07:00 |
|
oobabooga
|
779e7611ff
|
Use logger.exception() instead of traceback.print_exc() for error messages
|
2026-03-18 20:42:20 -07:00 |
|