Commit graph

2186 commits

Author SHA1 Message Date
oobabooga
05e4842033 Fix image generation: default to SDPA attention backend 2026-04-05 20:03:06 -07:00
oobabooga
b1d06dcf96 UI: Add MCP server support 2026-04-05 19:46:01 -07:00
oobabooga
abc3487f4d UI: Move cpu-moe checkbox to extra flags (no longer useful now that --fit exists) 2026-04-05 18:24:26 -07:00
oobabooga
d78fc46114 Fix "address already in use" on server restart (Linux/macOS) 2026-04-05 16:42:27 -07:00
oobabooga
422f42ca7f Pre-compile LaTeX regex in html_generator.py 2026-04-04 23:51:15 -07:00
oobabooga
544fcb0b7f Simplify modules/image_models.py 2026-04-04 23:29:57 -07:00
oobabooga
c63a79ee48 Image generation: Embed generation metadata in API image responses 2026-04-04 23:15:14 -07:00
oobabooga
dfd8ec9c49 UI: Make accordion outline styling global 2026-04-04 20:13:20 -07:00
oobabooga
1b403a4ffa UI: Fix inline LaTeX rendering by protecting $...$ from markdown (closes #7423) 2026-04-04 19:33:05 -07:00
oobabooga
ffea8f282e UI: Improve message text contrast 2026-04-04 18:53:13 -07:00
oobabooga
7fed60f90a UI: Improve the hover menu looks 2026-04-04 18:29:36 -07:00
oobabooga
2eef90a323 API: Remove deprecated "settings" parameter from model load endpoint 2026-04-04 11:00:14 -07:00
oobabooga
9183dc444e API: Fix loader args leaking between sequential model loads 2026-04-04 10:48:53 -07:00
oobabooga
e0ad4e60df UI: Fix tool buffer check truncating visible text at end of generation 2026-04-04 09:57:07 -07:00
oobabooga
54b2f39c78 Cleanup modules/chat.py 2026-04-03 22:07:21 -07:00
oobabooga
fc35acab9b API: Fix tool call parser crash on non-dict JSON output 2026-04-03 16:56:15 -07:00
oobabooga
8ecdb41078
fix(security): sanitize filenames in all prompt file operations (CWE-22) (#7462)
---------

Co-authored-by: Alex Chen <ffulbtech@gmail.com>
2026-04-03 19:36:50 -03:00
oobabooga
95d6c53e13 Revert "API: Add warning about vanilla llama-server not supporting prompt logprobs + instructions"
This reverts commit 42dfcdfc5b.
2026-04-03 07:30:48 -07:00
oobabooga
66d1a22c73 Fix crash when no model is selected (None passed to resolve_model_path) 2026-04-03 05:56:36 -07:00
oobabooga
000d776967 Revert "llama.cpp: Disable jinja by default (we use Python jinja, not cpp jinja)"
This reverts commit a1cb5b5dc0.
2026-04-03 05:49:03 -07:00
oobabooga
a1cb5b5dc0 llama.cpp: Disable jinja by default (we use Python jinja, not cpp jinja)
This was causing template compilation issues with qwen models.
2026-04-02 21:56:40 -07:00
oobabooga
42dfcdfc5b API: Add warning about vanilla llama-server not supporting prompt logprobs + instructions 2026-04-02 20:46:27 -07:00
oobabooga
6e2b70bde6 Add Gemma 4 tool-calling support 2026-04-02 20:26:27 -07:00
oobabooga
b108c55353 Fix portable builds not starting due to missing ik element 2026-04-02 19:14:50 -07:00
oobabooga
7aab2fdf9a API: Improve cache clearing in logprobs 2026-04-02 17:50:42 -07:00
oobabooga
091037ec20 Fix top_logprobs_ids missing for llama.cpp loader 2026-04-02 16:13:45 -03:00
oobabooga
ea1f8c71f2 API: Optimize prompt logprobs and refactor ExLlamav3 forward pass 2026-04-02 14:31:11 -03:00
oobabooga
c10c6e87ae API: Add token ids to logprobs output 2026-04-02 07:17:27 -07:00
oobabooga
a32ce254f2 Don't pass torch_dtype to transformers, autodetect from model config 2026-04-02 00:44:14 -03:00
oobabooga
4073164be0 Fix ExLlamav3 OOM on prompt logprobs and qwen3_5_moe HF compat 2026-04-01 19:44:55 -07:00
oobabooga
71c1a52afe API: Implement echo + logprobs for /v1/completions endpoint 2026-03-31 07:43:11 -07:00
oobabooga
6382fbef83 Several small code simplifications 2026-03-30 19:36:03 -07:00
oobabooga
0466b6e271 ik_llama.cpp: Auto-enable Hadamard KV cache rotation with quantized cache 2026-03-29 15:52:36 -07:00
oobabooga
4979e87e48 Add ik_llama.cpp support via ik_llama_cpp_binaries package 2026-03-28 12:09:00 -03:00
oobabooga
9dd04b86ce Suppress EOS token at logit level for ExLlamav3 when ban_eos_token is set 2026-03-28 06:17:57 -07:00
oobabooga
bda95172bd Fix stopping string detection for chromadb/context-1 2026-03-28 06:09:53 -07:00
oobabooga
4cbea02ed4 Add ik_llama.cpp support via --ik flag 2026-03-26 06:54:47 -07:00
oobabooga
e154140021 Rename "truncation length" to "context length" in logs 2026-03-25 07:21:02 -07:00
oobabooga
368f37335f Fix --idle-timeout issues with encode/decode and parallel generation 2026-03-25 06:37:45 -07:00
oobabooga
d6f1485dd1 UI: Update the enable_thinking info message 2026-03-24 21:45:11 -07:00
oobabooga
807be11832 Remove obsolete models/config.yaml and related code 2026-03-24 18:48:50 -07:00
oobabooga
750502695c Fix GPT-OSS tool-calling after 9ec20d97 2026-03-24 11:39:24 -07:00
oobabooga
a7ef430b38 Revert "llama.cpp: Don't suppress llama-server logs"
This reverts commit 9488df3e48.
2026-03-23 20:22:51 -07:00
oobabooga
286bbb685d Revert "Follow-up to previous commit"
This reverts commit 1dda5e4711.
2026-03-23 20:22:46 -07:00
oobabooga
02f18a1d65 API: Add thinking block signature field, fix error codes, clean up logging 2026-03-23 07:06:38 -07:00
oobabooga
307d0c92be UI polish 2026-03-23 06:35:14 -07:00
oobabooga
9ec20d9730 Strip thinking blocks before tool-call parsing 2026-03-22 19:19:14 -07:00
Phrosty1
bde496ea5d
Fix prompt corruption when continuing with context truncation (#7439) 2026-03-22 21:48:56 -03:00
oobabooga
1dda5e4711 Follow-up to previous commit 2026-03-21 20:58:45 -07:00
oobabooga
9488df3e48 llama.cpp: Don't suppress llama-server logs 2026-03-21 20:47:26 -07:00