Compare commits

..

53 commits
v4.3 ... main

Author SHA1 Message Date
oobabooga
9dcf574160
Merge pull request #7471 from oobabooga/dev
Merge dev branch
2026-04-06 21:54:02 -03:00
oobabooga
e18f32cba7 Remove hardcoded trust_remote_code=True in embedding loader 2026-04-06 17:47:50 -07:00
oobabooga
778e1c4d52 Update llama.cpp/ik_llama.cpp 2026-04-06 17:04:49 -07:00
oobabooga
775c913de2 Fix crash when truncating prompts with tool call messages 2026-04-06 14:13:01 -07:00
oobabooga
cb511928e2 Fix GPT-OSS tag leak during streaming between thinking and tool calls 2026-04-06 12:06:28 -07:00
oobabooga
193424cc93 API: Fix IPv6 address formatting 2026-04-06 10:07:52 -07:00
oobabooga
c26ffdd24c API: add instruction_template support to the model load endpoint 2026-04-06 07:02:53 -07:00
oobabooga
4d6230a944 Follow-up to d78fc46114 2026-04-06 06:48:48 -07:00
oobabooga
7b2f15e34a Minor change after b1d06dcf96 2026-04-05 21:16:32 -07:00
oobabooga
05e4842033 Fix image generation: default to SDPA attention backend 2026-04-05 20:03:06 -07:00
oobabooga
b1d06dcf96 UI: Add MCP server support 2026-04-05 19:46:01 -07:00
oobabooga
abc3487f4d UI: Move cpu-moe checkbox to extra flags (no longer useful now that --fit exists) 2026-04-05 18:24:26 -07:00
oobabooga
223dd4b801 UI: Hide spin buttons on number inputs 2026-04-05 18:22:50 -07:00
oobabooga
f8db23b362 Call ik portable build folders text-generation-webui-ik-version 2026-04-05 17:12:28 -07:00
oobabooga
d78fc46114 Fix "address already in use" on server restart (Linux/macOS) 2026-04-05 16:42:27 -07:00
oobabooga
422f42ca7f Pre-compile LaTeX regex in html_generator.py 2026-04-04 23:51:15 -07:00
oobabooga
544fcb0b7f Simplify modules/image_models.py 2026-04-04 23:29:57 -07:00
oobabooga
c63a79ee48 Image generation: Embed generation metadata in API image responses 2026-04-04 23:15:14 -07:00
oobabooga
9805ddcde9 Update the custom gradio wheels 2026-04-04 21:34:09 -07:00
oobabooga
91f9b01516 UI: Minor change 2026-04-04 21:13:20 -07:00
oobabooga
1f49a64e1a UI: Improve blockquote border width and color 2026-04-04 20:44:37 -07:00
oobabooga
e8b31c063a UI: Soften message action icons in light mode 2026-04-04 20:38:31 -07:00
oobabooga
ee917cd5ed UI: Make table and hr borders more subtle 2026-04-04 20:35:27 -07:00
oobabooga
dfd8ec9c49 UI: Make accordion outline styling global 2026-04-04 20:13:20 -07:00
oobabooga
0c033caf0e UI: Reduce spacing above chat input 2026-04-04 20:09:28 -07:00
oobabooga
1b403a4ffa UI: Fix inline LaTeX rendering by protecting $...$ from markdown (closes #7423) 2026-04-04 19:33:05 -07:00
oobabooga
8cb7fe9c47 UI: Improve message action icon visibility in light mode 2026-04-04 19:14:17 -07:00
oobabooga
41bce3f4de UI: Improve scrollbars style 2026-04-04 19:07:36 -07:00
oobabooga
ffea8f282e UI: Improve message text contrast 2026-04-04 18:53:13 -07:00
oobabooga
7fed60f90a UI: Improve the hover menu looks 2026-04-04 18:29:36 -07:00
oobabooga
2eef90a323 API: Remove deprecated "settings" parameter from model load endpoint 2026-04-04 11:00:14 -07:00
oobabooga
9183dc444e API: Fix loader args leaking between sequential model loads 2026-04-04 10:48:53 -07:00
oobabooga
e0ad4e60df UI: Fix tool buffer check truncating visible text at end of generation 2026-04-04 09:57:07 -07:00
oobabooga
16af11f868 Update README 2026-04-04 04:22:37 -07:00
oobabooga
54b2f39c78 Cleanup modules/chat.py 2026-04-03 22:07:21 -07:00
oobabooga
b5afecc63b
Merge pull request #7464 from oobabooga/dev
Merge dev branch
2026-04-04 00:56:02 -03:00
oobabooga
2fbaee58cd Add Windows + ROCm portable builds 2026-04-03 20:54:28 -07:00
oobabooga
62e67adb55
Merge pull request #7463 from oobabooga/dev
Merge dev branch
2026-04-03 20:58:32 -03:00
oobabooga
fc35acab9b API: Fix tool call parser crash on non-dict JSON output 2026-04-03 16:56:15 -07:00
oobabooga
8ecdb41078
fix(security): sanitize filenames in all prompt file operations (CWE-22) (#7462)
---------

Co-authored-by: Alex Chen <ffulbtech@gmail.com>
2026-04-03 19:36:50 -03:00
oobabooga
5fb8c4fbd6 Update the custom gradio wheels 2026-04-03 11:02:00 -07:00
oobabooga
0050a33f37
Merge pull request #7461 from oobabooga/dev
Merge dev branch
2026-04-03 14:07:42 -03:00
oobabooga
6b66da84d2 Update the custom gradio wheels 2026-04-03 10:01:51 -07:00
oobabooga
8e8e1ba898 Update the custom gradio wheels 2026-04-03 09:50:15 -07:00
oobabooga
131a9a0140 Update llama.cpp 2026-04-03 09:15:03 -07:00
oobabooga
95d6c53e13 Revert "API: Add warning about vanilla llama-server not supporting prompt logprobs + instructions"
This reverts commit 42dfcdfc5b.
2026-04-03 07:30:48 -07:00
oobabooga
8bba9ecc3f Update the custom gradio wheels 2026-04-03 05:58:05 -07:00
oobabooga
66d1a22c73 Fix crash when no model is selected (None passed to resolve_model_path) 2026-04-03 05:56:36 -07:00
oobabooga
000d776967 Revert "llama.cpp: Disable jinja by default (we use Python jinja, not cpp jinja)"
This reverts commit a1cb5b5dc0.
2026-04-03 05:49:03 -07:00
oobabooga
a1cb5b5dc0 llama.cpp: Disable jinja by default (we use Python jinja, not cpp jinja)
This was causing template compilation issues with qwen models.
2026-04-02 21:56:40 -07:00
oobabooga
b11379f328
Merge pull request #7455 from oobabooga/dev
Merge dev branch
2026-04-03 00:50:06 -03:00
oobabooga
42dfcdfc5b API: Add warning about vanilla llama-server not supporting prompt logprobs + instructions 2026-04-02 20:46:27 -07:00
oobabooga
6e2b70bde6 Add Gemma 4 tool-calling support 2026-04-02 20:26:27 -07:00
49 changed files with 749 additions and 325 deletions

View file

@ -41,6 +41,13 @@ jobs:
version: ${{ inputs.version }} version: ${{ inputs.version }}
config: 'os:ubuntu-22.04' config: 'os:ubuntu-22.04'
build_release_rocm_windows:
name: ROCm Windows
uses: ./.github/workflows/build-portable-release-rocm.yml
with:
version: ${{ inputs.version }}
config: 'os:windows-2022'
build_release_rocm_linux: build_release_rocm_linux:
name: ROCm Linux name: ROCm Linux
uses: ./.github/workflows/build-portable-release-rocm.yml uses: ./.github/workflows/build-portable-release-rocm.yml

View file

@ -102,8 +102,8 @@ jobs:
VERSION_CLEAN="${{ inputs.version }}" VERSION_CLEAN="${{ inputs.version }}"
VERSION_CLEAN="${VERSION_CLEAN#v}" VERSION_CLEAN="${VERSION_CLEAN#v}"
cd .. cd ..
cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}" cp -r text-generation-webui "text-generation-webui-ik-${VERSION_CLEAN}"
cd "text-generation-webui-${VERSION_CLEAN}" cd "text-generation-webui-ik-${VERSION_CLEAN}"
# Remove extensions that need additional requirements # Remove extensions that need additional requirements
allowed=("character_bias" "gallery" "sd_api_pictures") allowed=("character_bias" "gallery" "sd_api_pictures")
@ -133,10 +133,10 @@ jobs:
echo "Downloading Python for $PLATFORM..." echo "Downloading Python for $PLATFORM..."
curl -L -o python-build.tar.gz "$PYTHON_URL" curl -L -o python-build.tar.gz "$PYTHON_URL"
tar -xzf python-build.tar.gz tar -xzf python-build.tar.gz
mv python "text-generation-webui-${VERSION_CLEAN}/portable_env" mv python "text-generation-webui-ik-${VERSION_CLEAN}/portable_env"
# 3. Prepare requirements file based on CUDA version # 3. Prepare requirements file based on CUDA version
cd "text-generation-webui-${VERSION_CLEAN}" cd "text-generation-webui-ik-${VERSION_CLEAN}"
if [[ "$CUDA_VERSION" == "13.1" ]]; then if [[ "$CUDA_VERSION" == "13.1" ]]; then
REQ_FILE="requirements/portable/requirements_ik_cuda131.txt" REQ_FILE="requirements/portable/requirements_ik_cuda131.txt"
else else
@ -158,11 +158,11 @@ jobs:
if [[ "$RUNNER_OS" == "Windows" ]]; then if [[ "$RUNNER_OS" == "Windows" ]]; then
ARCHIVE_NAME="textgen-portable-ik-${VERSION_CLEAN}-${PLATFORM}-cuda${CUDA_VERSION}.zip" ARCHIVE_NAME="textgen-portable-ik-${VERSION_CLEAN}-${PLATFORM}-cuda${CUDA_VERSION}.zip"
echo "Creating archive: $ARCHIVE_NAME" echo "Creating archive: $ARCHIVE_NAME"
powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME" powershell -Command "Compress-Archive -Path text-generation-webui-ik-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
else else
ARCHIVE_NAME="textgen-portable-ik-${VERSION_CLEAN}-${PLATFORM}-cuda${CUDA_VERSION}.tar.gz" ARCHIVE_NAME="textgen-portable-ik-${VERSION_CLEAN}-${PLATFORM}-cuda${CUDA_VERSION}.tar.gz"
echo "Creating archive: $ARCHIVE_NAME" echo "Creating archive: $ARCHIVE_NAME"
tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}" tar czf "$ARCHIVE_NAME" "text-generation-webui-ik-${VERSION_CLEAN}"
fi fi
- name: Upload files to a GitHub release - name: Upload files to a GitHub release

View file

@ -101,8 +101,8 @@ jobs:
VERSION_CLEAN="${{ inputs.version }}" VERSION_CLEAN="${{ inputs.version }}"
VERSION_CLEAN="${VERSION_CLEAN#v}" VERSION_CLEAN="${VERSION_CLEAN#v}"
cd .. cd ..
cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}" cp -r text-generation-webui "text-generation-webui-ik-${VERSION_CLEAN}"
cd "text-generation-webui-${VERSION_CLEAN}" cd "text-generation-webui-ik-${VERSION_CLEAN}"
# Remove extensions that need additional requirements # Remove extensions that need additional requirements
allowed=("character_bias" "gallery" "sd_api_pictures") allowed=("character_bias" "gallery" "sd_api_pictures")
@ -131,10 +131,10 @@ jobs:
cd .. cd ..
curl -L -o python-build.tar.gz "$PYTHON_URL" curl -L -o python-build.tar.gz "$PYTHON_URL"
tar -xzf python-build.tar.gz tar -xzf python-build.tar.gz
mv python "text-generation-webui-${VERSION_CLEAN}/portable_env" mv python "text-generation-webui-ik-${VERSION_CLEAN}/portable_env"
# 3. Prepare requirements file # 3. Prepare requirements file
cd "text-generation-webui-${VERSION_CLEAN}" cd "text-generation-webui-ik-${VERSION_CLEAN}"
REQ_FILE="requirements/portable/requirements_ik_cpu_only.txt" REQ_FILE="requirements/portable/requirements_ik_cpu_only.txt"
echo "Using requirements file: $REQ_FILE" echo "Using requirements file: $REQ_FILE"
@ -153,11 +153,11 @@ jobs:
if [[ "$RUNNER_OS" == "Windows" ]]; then if [[ "$RUNNER_OS" == "Windows" ]]; then
ARCHIVE_NAME="textgen-portable-ik-${VERSION_CLEAN}-${PLATFORM}.zip" ARCHIVE_NAME="textgen-portable-ik-${VERSION_CLEAN}-${PLATFORM}.zip"
echo "Creating archive: $ARCHIVE_NAME" echo "Creating archive: $ARCHIVE_NAME"
powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME" powershell -Command "Compress-Archive -Path text-generation-webui-ik-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
else else
ARCHIVE_NAME="textgen-portable-ik-${VERSION_CLEAN}-${PLATFORM}.tar.gz" ARCHIVE_NAME="textgen-portable-ik-${VERSION_CLEAN}-${PLATFORM}.tar.gz"
echo "Creating archive: $ARCHIVE_NAME" echo "Creating archive: $ARCHIVE_NAME"
tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}" tar czf "$ARCHIVE_NAME" "text-generation-webui-ik-${VERSION_CLEAN}"
fi fi
- name: Upload files to a GitHub release - name: Upload files to a GitHub release

View file

@ -24,9 +24,9 @@ A Gradio web UI for running Large Language Models locally. 100% private and offl
## Features ## Features
- **Easy setup**: [Portable builds](https://github.com/oobabooga/text-generation-webui/releases) (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or a one-click installer for the full feature set. - **Easy setup**: [Portable builds](https://github.com/oobabooga/text-generation-webui/releases) (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or a one-click installer for the full feature set.
- **Multiple backends**: [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Switch between backends and models without restarting. - **Multiple backends**: [llama.cpp](https://github.com/ggerganov/llama.cpp), [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Switch between backends and models without restarting.
- **OpenAI/Anthropic-compatible API**: Chat, Completions, and Messages endpoints with tool-calling support. Use as a local drop-in replacement for the OpenAI/Anthropic APIs ([examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples)). - **OpenAI/Anthropic-compatible API**: Chat, Completions, and Messages endpoints with tool-calling support. Use as a local drop-in replacement for the OpenAI/Anthropic APIs ([examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples)).
- **Tool-calling**: Models can call custom functions during chat — web search, page fetching, math, and more. Each tool is a single `.py` file, easy to create and extend ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Tool-Calling-Tutorial)). - **Tool-calling**: Models can call custom functions during chat — web search, page fetching, math, and more. Each tool is a single `.py` file. MCP servers are also supported ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Tool-Calling-Tutorial)).
- **Vision (multimodal)**: Attach images to messages for visual understanding ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Multimodal-Tutorial)). - **Vision (multimodal)**: Attach images to messages for visual understanding ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Multimodal-Tutorial)).
- **File attachments**: Upload text files, PDF documents, and .docx documents to talk about their contents. - **File attachments**: Upload text files, PDF documents, and .docx documents to talk about their contents.
- **Training**: Fine-tune LoRAs on multi-turn chat or raw text datasets. Supports resuming interrupted runs ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/05-%E2%80%90-Training-Tab)). - **Training**: Fine-tune LoRAs on multi-turn chat or raw text datasets. Supports resuming interrupted runs ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/05-%E2%80%90-Training-Tab)).

View file

@ -13,21 +13,12 @@
line-height: 28px !important; line-height: 28px !important;
} }
.dark .chat .message-body :is(p,li,h1,h2,h3,h4,h5,h6), .dark .chat .message-body :is(p,li),
.dark .chat .message-body em:not(:is(h1,h2,h3,h4,h5,h6,b,strong) em), .dark .chat .message-body em:not(:is(h1,h2,h3,h4,h5,h6,b,strong) em),
.dark .chat .message-body q:not(:is(h1,h2,h3,h4,h5,h6,b,strong) q) { .dark .chat .message-body q:not(:is(h1,h2,h3,h4,h5,h6,b,strong) q) {
color: #d1d5db !important; color: #d1d5db !important;
} }
.chat .message-body :is(th, td),
.prose hr {
border-color: #40404096 !important;
}
.dark .chat .message-body :is(th, td),
.dark .prose hr {
border-color: rgb(255 255 255 / 30%) !important;
}
.chat .message-body :is(p, ul, ol) { .chat .message-body :is(p, ul, ol) {
margin: 1.25em 0 !important; margin: 1.25em 0 !important;

View file

@ -22,6 +22,17 @@
font-style: italic; font-style: italic;
} }
/* Hide spin buttons on number inputs (look bad on Windows) */
input[type="number"]::-webkit-outer-spin-button,
input[type="number"]::-webkit-inner-spin-button {
-webkit-appearance: none;
margin: 0;
}
input[type="number"] {
-moz-appearance: textfield;
}
.padded.svelte-12cmxck { .padded.svelte-12cmxck {
padding: 3px 0; padding: 3px 0;
} }
@ -246,8 +257,8 @@ button {
.pretty_scrollbar::-webkit-scrollbar, .pretty_scrollbar::-webkit-scrollbar,
#image-history-gallery > :nth-child(2)::-webkit-scrollbar { #image-history-gallery > :nth-child(2)::-webkit-scrollbar {
width: 8px; width: 7px;
height: 8px; height: 7px;
} }
.pretty_scrollbar::-webkit-scrollbar-track, .pretty_scrollbar::-webkit-scrollbar-track,
@ -260,7 +271,7 @@ button {
#image-history-gallery > :nth-child(2)::-webkit-scrollbar-thumb, #image-history-gallery > :nth-child(2)::-webkit-scrollbar-thumb,
#image-history-gallery > :nth-child(2)::-webkit-scrollbar-thumb:hover { #image-history-gallery > :nth-child(2)::-webkit-scrollbar-thumb:hover {
background: var(--neutral-300); background: var(--neutral-300);
border-radius: 30px; border-radius: 9999px;
} }
.dark .pretty_scrollbar::-webkit-scrollbar-thumb, .dark .pretty_scrollbar::-webkit-scrollbar-thumb,
@ -268,18 +279,17 @@ button {
.dark #image-history-gallery > :nth-child(2)::-webkit-scrollbar-thumb, .dark #image-history-gallery > :nth-child(2)::-webkit-scrollbar-thumb,
.dark #image-history-gallery > :nth-child(2)::-webkit-scrollbar-thumb:hover { .dark #image-history-gallery > :nth-child(2)::-webkit-scrollbar-thumb:hover {
background: rgb(255 255 255 / 6.25%); background: rgb(255 255 255 / 6.25%);
border-radius: 30px; border-radius: 9999px;
} }
.pretty_scrollbar::-webkit-resizer, .pretty_scrollbar::-webkit-resizer,
#image-history-gallery > :nth-child(2)::-webkit-resizer { #image-history-gallery > :nth-child(2)::-webkit-resizer {
background: #d2d2d8; background: transparent;
} }
.dark .pretty_scrollbar::-webkit-resizer, .dark .pretty_scrollbar::-webkit-resizer,
.dark #image-history-gallery > :nth-child(2)::-webkit-resizer { .dark #image-history-gallery > :nth-child(2)::-webkit-resizer {
background: rgb(255 255 255 / 10%); background: transparent;
border-radius: 10px;
} }
.pretty_scrollbar::-webkit-scrollbar-corner, .pretty_scrollbar::-webkit-scrollbar-corner,
@ -436,15 +446,25 @@ audio {
.dark .message-body h4, .dark .message-body h4,
.dark .message-body h5, .dark .message-body h5,
.dark .message-body h6 { .dark .message-body h6 {
color: white !important; color: #e8e8e8 !important;
} }
.dark .message-body blockquote { .message-body blockquote {
border-left-color: rgb(255 255 255 / 30%); border-left-width: 4px;
border-left-color: var(--border-color-primary);
}
.message-body h1,
.message-body h2,
.message-body h3,
.message-body h4,
.message-body h5,
.message-body h6 {
color: #1a1a1a;
} }
.message-body h1 { .message-body h1 {
font-weight: 800; font-weight: 700;
font-size: 2.25em; font-size: 2.25em;
margin-top: 0; margin-top: 0;
margin-bottom: 0.8888889em; margin-bottom: 0.8888889em;
@ -476,13 +496,13 @@ audio {
} }
.message-body h5 { .message-body h5 {
font-weight: normal; font-weight: 600;
font-size: 1em; font-size: 1em;
margin: 0; margin: 0;
} }
.message-body h6 { .message-body h6 {
font-weight: normal; font-weight: 600;
font-size: 1em; font-size: 1em;
margin: 0; margin: 0;
} }
@ -590,7 +610,7 @@ audio {
} }
#chat-input textarea::-webkit-scrollbar { #chat-input textarea::-webkit-scrollbar {
width: 8px; width: 7px;
} }
#chat-input textarea::-webkit-scrollbar-track { #chat-input textarea::-webkit-scrollbar-track {
@ -599,7 +619,7 @@ audio {
#chat-input textarea::-webkit-scrollbar-thumb { #chat-input textarea::-webkit-scrollbar-thumb {
background: var(--neutral-300); background: var(--neutral-300);
border-radius: 30px; border-radius: 9999px;
} }
.dark #chat-input textarea::-webkit-scrollbar-thumb { .dark #chat-input textarea::-webkit-scrollbar-thumb {
@ -633,6 +653,10 @@ audio {
background: transparent; background: transparent;
} }
#chat-input .thumbnails {
padding-top: 3px;
}
.chat-input-positioned { .chat-input-positioned {
max-width: 54rem; max-width: 54rem;
left: 50%; left: 50%;
@ -735,7 +759,30 @@ audio {
.hover-element { .hover-element {
position: relative; position: relative;
font-size: 24px; padding-top: 4px;
}
#hover-element-button {
display: flex;
align-items: center;
justify-content: center;
width: 32px;
height: 32px;
border-radius: 0.5rem;
cursor: pointer;
color: gray;
}
#hover-element-button:hover {
background-color: var(--background-fill-secondary);
}
#hover-element-button svg {
color: inherit;
}
.dark #hover-element-button:hover {
background-color: var(--selected-item-color-dark);
} }
.hover-menu { .hover-menu {
@ -743,27 +790,40 @@ audio {
position: absolute; position: absolute;
bottom: 100%; bottom: 100%;
left: 0; left: 0;
box-shadow: 0 2px 12px rgb(0 0 0 / 15%); background: white;
border-radius: 0.5rem; border: 1px solid rgba(0, 0, 0, 0.1);
box-shadow: 0 4px 16px rgb(0 0 0 / 12%), 0 1px 3px rgb(0 0 0 / 8%);
border-radius: 0.75rem;
z-index: 10000; z-index: 10000;
min-width: 330px; min-width: 330px;
flex-direction: column; flex-direction: column;
overflow: hidden; padding: 4px;
}
.hover-menu::before {
content: '';
position: absolute;
top: 100%;
left: 0;
width: 100%;
height: 8px;
}
.hover-menu > * {
border: none !important;
box-shadow: none !important;
} }
.hover-menu button { .hover-menu button {
width: 100%; width: 100%;
background: white !important; background: transparent !important;
border-radius: 0 !important; border: none !important;
border-radius: 0.5rem !important;
justify-content: space-between; justify-content: space-between;
margin: 0 !important; margin: 0 !important;
height: 36px; height: 36px;
border-color: transparent !important; font-weight: 500;
transition: background-color 0.15s ease; box-shadow: none !important;
}
.hover-menu button:not(#clear-history-confirm) {
border-bottom: 0 !important;
} }
.hover-menu button:hover { .hover-menu button:hover {
@ -775,19 +835,26 @@ audio {
} }
#show-controls { #show-controls {
background-color: white; background-color: transparent;
border-color: transparent !important; border: none !important;
height: 36px; height: 36px;
border-radius: 0; border-radius: 0.5rem;
border-bottom: 0 !important;
padding-top: 3px; padding-top: 3px;
padding-left: 4px; padding-left: 4px;
display: flex; display: flex;
font-weight: normal; font-weight: normal;
} }
#show-controls:hover {
background-color: #dbeafe;
}
.dark #show-controls { .dark #show-controls {
background-color: var(--darker-gray); background-color: transparent;
}
.dark #show-controls:hover {
background-color: var(--selected-item-color-dark);
} }
#show-controls label { #show-controls label {
@ -797,12 +864,12 @@ audio {
width: 100%; width: 100%;
padding-right: 12px; padding-right: 12px;
gap: 10px; gap: 10px;
font-weight: 600; font-weight: 500;
color: var(--button-secondary-text-color); color: var(--button-secondary-text-color);
} }
#show-controls label input { #show-controls label input {
margin-top: 4px; margin-top: 5px;
} }
.transparent-substring { .transparent-substring {
@ -842,7 +909,7 @@ audio {
} }
#chat-input-row { #chat-input-row {
padding: 1rem; padding: 0.5rem 1rem 1rem;
} }
#chat-col { #chat-col {
@ -1208,9 +1275,14 @@ audio {
color: #9ca3af; color: #9ca3af;
} }
.dark .hover-menu {
background: var(--darker-gray);
border-color: transparent;
box-shadow: 0 4px 16px rgb(0 0 0 / 40%);
}
.dark .hover-menu button { .dark .hover-menu button {
border-color: var(--border-color-primary); background-color: transparent !important;
background-color: var(--darker-gray) !important;
} }
.dark #chat-controls, .dark #chat-controls,
@ -1372,8 +1444,7 @@ audio {
} }
.footer-button svg { .footer-button svg {
stroke: rgb(156 163 175); stroke: rgb(140 140 148);
transition: stroke 0.2s;
} }
.footer-button:hover svg { .footer-button:hover svg {
@ -1388,12 +1459,12 @@ audio {
stroke: rgb(209 213 219); stroke: rgb(209 213 219);
} }
.tgw-accordion { .block:has(> .label-wrap) {
padding: 10px 12px !important; padding: 10px 12px !important;
border: 1px solid #d2d2d8; border: 1px solid #d2d2d8;
} }
.dark .tgw-accordion { .dark .block:has(> .label-wrap) {
border: 1px solid var(--border-color-dark); border: 1px solid var(--border-color-dark);
} }
@ -1903,14 +1974,24 @@ table, tr, td, th, thead {
border: 0; border: 0;
} }
.prose hr {
border-color: var(--border-color-primary);
}
td + td, td + td,
th + th { border-left: 1px solid; } th + th {
border-left: 1px solid var(--border-color-primary) !important;
}
tr + tr td, tr + tr td,
tr + tr th { border-top: 1px solid; } tr + tr th {
border-top: 1px solid var(--border-color-primary) !important;
}
thead + tbody tr:first-child td, thead + tbody tr:first-child td,
thead + tbody tr:first-child th { border-top: 1px solid; } thead + tbody tr:first-child th {
border-top: 1px solid var(--border-color-primary) !important;
}
/* ------------------------------------------------ /* ------------------------------------------------
Tools CheckboxGroup - vertical DragDrop-like style Tools CheckboxGroup - vertical DragDrop-like style
@ -1942,8 +2023,8 @@ thead + tbody tr:first-child th { border-top: 1px solid; }
/* Pretty scrollbar for the tools list */ /* Pretty scrollbar for the tools list */
#tools-group .wrap::-webkit-scrollbar { #tools-group .wrap::-webkit-scrollbar {
width: 8px; width: 7px;
height: 8px; height: 7px;
} }
#tools-group .wrap::-webkit-scrollbar-track { #tools-group .wrap::-webkit-scrollbar-track {
@ -1953,13 +2034,13 @@ thead + tbody tr:first-child th { border-top: 1px solid; }
#tools-group .wrap::-webkit-scrollbar-thumb, #tools-group .wrap::-webkit-scrollbar-thumb,
#tools-group .wrap::-webkit-scrollbar-thumb:hover { #tools-group .wrap::-webkit-scrollbar-thumb:hover {
background: var(--neutral-300); background: var(--neutral-300);
border-radius: 30px; border-radius: 9999px;
} }
.dark #tools-group .wrap::-webkit-scrollbar-thumb, .dark #tools-group .wrap::-webkit-scrollbar-thumb,
.dark #tools-group .wrap::-webkit-scrollbar-thumb:hover { .dark #tools-group .wrap::-webkit-scrollbar-thumb:hover {
background: rgb(255 255 255 / 6.25%); background: rgb(255 255 255 / 6.25%);
border-radius: 30px; border-radius: 9999px;
} }
#tools-group .wrap::-webkit-scrollbar-corner { #tools-group .wrap::-webkit-scrollbar-corner {

View file

@ -232,6 +232,17 @@ curl -k http://127.0.0.1:5000/v1/internal/model/load \
}' }'
``` ```
You can also set a default instruction template for all subsequent API requests by passing `instruction_template` (a template name from `user_data/instruction-templates/`) or `instruction_template_str` (a raw Jinja2 string):
```shell
curl -k http://127.0.0.1:5000/v1/internal/model/load \
-H "Content-Type: application/json" \
-d '{
"model_name": "Qwen_Qwen3-0.6B-Q4_K_M.gguf",
"instruction_template": "Alpaca"
}'
```
#### Python chat example #### Python chat example
```python ```python

View file

@ -80,6 +80,19 @@ def execute(arguments):
You can open the built-in tools in `user_data/tools/` for more examples. You can open the built-in tools in `user_data/tools/` for more examples.
## MCP servers
You can connect to remote [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) servers to use their tools alongside local ones.
In the chat sidebar, open the **MCP servers** accordion and enter one server URL per line. For servers that require authentication, append headers after the URL separated by commas:
```
https://example.com/mcp
https://other.com/mcp,Authorization: Bearer sk-xxx
```
All tools from the configured servers are automatically discovered and made available to the model during generation. If an MCP tool has the same name as a selected local tool, the local tool takes priority.
## Tool calling over the API ## Tool calling over the API
Tool calling over the API follows the [OpenAI API](https://platform.openai.com/docs/guides/function-calling) convention. Define your tools, send them with your messages, and handle tool calls in a loop until the model gives a final answer. Tool calling over the API follows the [OpenAI API](https://platform.openai.com/docs/guides/function-calling) convention. Define your tools, send them with your messages, and handle tool calls in a loop until the model gives a final answer.

View file

@ -309,18 +309,19 @@ for (let i = 0; i < slimDropdownElements.length; i++) {
// https://github.com/SillyTavern/SillyTavern/blob/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/script.js // https://github.com/SillyTavern/SillyTavern/blob/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/script.js
//------------------------------------------------ //------------------------------------------------
var buttonsInChat = document.querySelectorAll("#chat-tab #chat-buttons button, #chat-tab #chat-buttons #show-controls"); var buttonsInChat = document.querySelectorAll("#chat-tab #chat-buttons button, #chat-tab #chat-buttons #show-controls");
var hoverContainer = document.getElementById("gr-hover-container");
var button = document.getElementById("hover-element-button"); var button = document.getElementById("hover-element-button");
var menu = document.getElementById("hover-menu"); var menu = document.getElementById("hover-menu");
var istouchscreen = (navigator.maxTouchPoints > 0) || "ontouchstart" in document.documentElement; var istouchscreen = (navigator.maxTouchPoints > 0) || "ontouchstart" in document.documentElement;
function showMenu() { function showMenu() {
menu.style.display = "flex"; // Show the menu menu.style.display = "flex";
} }
function hideMenu() { function hideMenu() {
menu.style.display = "none"; // Hide the menu menu.style.display = "none";
if (!istouchscreen) { if (!istouchscreen) {
document.querySelector("#chat-input textarea").focus(); // Focus on the chat input document.querySelector("#chat-input textarea").focus();
} }
} }
@ -329,7 +330,6 @@ if (buttonsInChat.length > 0) {
const thisButton = buttonsInChat[i]; const thisButton = buttonsInChat[i];
menu.appendChild(thisButton); menu.appendChild(thisButton);
// Only apply transformations to button elements
if (thisButton.tagName.toLowerCase() === "button") { if (thisButton.tagName.toLowerCase() === "button") {
thisButton.addEventListener("click", () => { thisButton.addEventListener("click", () => {
hideMenu(); hideMenu();
@ -339,7 +339,6 @@ if (buttonsInChat.length > 0) {
const matches = buttonText.match(/(\(.*?\))/); const matches = buttonText.match(/(\(.*?\))/);
if (matches && matches.length > 1) { if (matches && matches.length > 1) {
// Apply the transparent-substring class to the matched substring
const substring = matches[1]; const substring = matches[1];
const newText = buttonText.replace(substring, `&nbsp;<span class="transparent-substring">${substring.slice(1, -1)}</span>`); const newText = buttonText.replace(substring, `&nbsp;<span class="transparent-substring">${substring.slice(1, -1)}</span>`);
thisButton.innerHTML = newText; thisButton.innerHTML = newText;
@ -348,16 +347,19 @@ if (buttonsInChat.length > 0) {
} }
} }
function isMouseOverButtonOrMenu() { var menuInteracting = false;
return menu.matches(":hover") || button.matches(":hover");
}
button.addEventListener("mouseenter", function () { hoverContainer.addEventListener("mouseenter", function () {
if (!istouchscreen) { if (!istouchscreen) {
showMenu(); showMenu();
} }
}); });
hoverContainer.addEventListener("mousedown", function () {
menuInteracting = true;
setTimeout(function () { menuInteracting = false; }, 300);
});
button.addEventListener("click", function () { button.addEventListener("click", function () {
if (menu.style.display === "flex") { if (menu.style.display === "flex") {
hideMenu(); hideMenu();
@ -367,24 +369,20 @@ button.addEventListener("click", function () {
} }
}); });
// Delay to prevent menu hiding when the mouse leaves the button or menu hoverContainer.addEventListener("mouseleave", function () {
function delayedHideMenu() { if (!istouchscreen) {
setTimeout(function () { setTimeout(function () {
if (!isMouseOverButtonOrMenu()) { if (!hoverContainer.matches(":hover") && !menu.matches(":hover")) {
hideMenu(); hideMenu();
} }
}, 100); }, 50);
} }
});
// Add event listener for mouseleave on the button
button.addEventListener("mouseleave", delayedHideMenu);
// Add event listener for mouseleave on the menu
menu.addEventListener("mouseleave", delayedHideMenu);
// Add event listener for click anywhere in the document // Add event listener for click anywhere in the document
document.addEventListener("click", function (event) { document.addEventListener("click", function (event) {
// Check if the click is outside the button/menu and the menu is visible // Check if the click is outside the button/menu and the menu is visible
if (!isMouseOverButtonOrMenu() && menu.style.display === "flex") { if (!menuInteracting && !event.target.closest("#gr-hover-container") && menu.style.display === "flex") {
hideMenu(); hideMenu();
} }

View file

@ -6,6 +6,7 @@ from transformers import AutoModel
from .errors import ServiceUnavailableError from .errors import ServiceUnavailableError
from .utils import debug_msg, float_list_to_base64 from .utils import debug_msg, float_list_to_base64
from modules.logging_colors import logger from modules.logging_colors import logger
from modules import shared
embeddings_params_initialized = False embeddings_params_initialized = False
@ -41,7 +42,7 @@ def load_embedding_model(model: str):
try: try:
logger.info(f"Try embedding model: {model} on {embeddings_device}") logger.info(f"Try embedding model: {model} on {embeddings_device}")
if 'jina-embeddings' in model: if 'jina-embeddings' in model:
embeddings_model = AutoModel.from_pretrained(model, trust_remote_code=True) # trust_remote_code is needed to use the encode method embeddings_model = AutoModel.from_pretrained(model, trust_remote_code=shared.args.trust_remote_code)
embeddings_model = embeddings_model.to(embeddings_device) embeddings_model = embeddings_model.to(embeddings_device)
else: else:
embeddings_model = SentenceTransformer(model, device=embeddings_device) embeddings_model = SentenceTransformer(model, device=embeddings_device)

View file

@ -4,8 +4,11 @@ OpenAI-compatible image generation using local diffusion models.
import base64 import base64
import io import io
import json
import time import time
from PIL.PngImagePlugin import PngInfo
from .errors import ServiceUnavailableError from .errors import ServiceUnavailableError
from modules import shared from modules import shared
@ -15,7 +18,7 @@ def generations(request):
Generate images using the loaded diffusion model. Generate images using the loaded diffusion model.
Returns dict with 'created' timestamp and 'data' list of images. Returns dict with 'created' timestamp and 'data' list of images.
""" """
from modules.ui_image_generation import generate from modules.ui_image_generation import build_generation_metadata, generate
if shared.image_model is None: if shared.image_model is None:
raise ServiceUnavailableError("No image model loaded. Load a model via the UI first.") raise ServiceUnavailableError("No image model loaded. Load a model via the UI first.")
@ -46,10 +49,18 @@ def generations(request):
if not images: if not images:
raise ServiceUnavailableError("Image generation failed or produced no images.") raise ServiceUnavailableError("Image generation failed or produced no images.")
# Build response # Build response with per-batch metadata (seed increments per batch)
base_seed = state.get('image_seed_resolved', state['image_seed'])
batch_size = int(state['image_batch_size'])
resp = {'created': int(time.time()), 'data': []} resp = {'created': int(time.time()), 'data': []}
for img in images: for idx, img in enumerate(images):
b64 = _image_to_base64(img) batch_seed = base_seed + idx // batch_size
metadata = build_generation_metadata(state, batch_seed)
metadata_json = json.dumps(metadata, ensure_ascii=False)
png_info = PngInfo()
png_info.add_text("image_gen_settings", metadata_json)
b64 = _image_to_base64(img, png_info)
image_obj = {'revised_prompt': request.prompt} image_obj = {'revised_prompt': request.prompt}
@ -63,7 +74,7 @@ def generations(request):
return resp return resp
def _image_to_base64(image) -> str: def _image_to_base64(image, png_info=None) -> str:
buffered = io.BytesIO() buffered = io.BytesIO()
image.save(buffered, format="PNG") image.save(buffered, format="PNG", pnginfo=png_info)
return base64.b64encode(buffered.getvalue()).decode('utf-8') return base64.b64encode(buffered.getvalue()).decode('utf-8')

View file

@ -2,7 +2,7 @@ from modules import loaders, shared
from modules.logging_colors import logger from modules.logging_colors import logger
from modules.LoRA import add_lora_to_model from modules.LoRA import add_lora_to_model
from modules.models import load_model, unload_model from modules.models import load_model, unload_model
from modules.models_settings import get_model_metadata, update_model_parameters from modules.models_settings import get_model_metadata, load_instruction_template, update_model_parameters
from modules.utils import get_available_loras, get_available_models from modules.utils import get_available_loras, get_available_models
@ -42,12 +42,10 @@ def model_info_dict(model_name: str) -> dict:
def _load_model(data): def _load_model(data):
model_name = data["model_name"] model_name = data["model_name"]
args = data["args"] args = data.get("args")
settings = data["settings"]
unload_model() unload_model()
model_settings = get_model_metadata(model_name) model_settings = get_model_metadata(model_name)
update_model_parameters(model_settings)
# Update shared.args with custom model loading settings # Update shared.args with custom model loading settings
# Security: only allow keys that correspond to model loading # Security: only allow keys that correspond to model loading
@ -55,6 +53,16 @@ def _load_model(data):
# flags like trust_remote_code or extra_flags to be set via the API. # flags like trust_remote_code or extra_flags to be set via the API.
blocked_keys = {'extra_flags'} blocked_keys = {'extra_flags'}
allowed_keys = set(loaders.list_model_elements()) - blocked_keys allowed_keys = set(loaders.list_model_elements()) - blocked_keys
# Reset all loader args to their startup values before applying new ones,
# so settings from a previous API load don't leak into this one.
# Include blocked keys in the reset (safe: restores startup value, not API-controlled).
for k in allowed_keys | blocked_keys:
if hasattr(shared.args, k) and hasattr(shared.original_args, k):
setattr(shared.args, k, getattr(shared.original_args, k))
update_model_parameters(model_settings)
if args: if args:
for k in args: for k in args:
if k in allowed_keys and hasattr(shared.args, k): if k in allowed_keys and hasattr(shared.args, k):
@ -62,15 +70,12 @@ def _load_model(data):
shared.model, shared.tokenizer = load_model(model_name) shared.model, shared.tokenizer = load_model(model_name)
# Update shared.settings with custom generation defaults if data.get("instruction_template_str") is not None:
if settings: shared.settings['instruction_template_str'] = data["instruction_template_str"]
for k in settings: logger.info("INSTRUCTION TEMPLATE: set to custom Jinja2 string")
if k in shared.settings: elif data.get("instruction_template") is not None:
shared.settings[k] = settings[k] shared.settings['instruction_template_str'] = load_instruction_template(data["instruction_template"])
if k == 'truncation_length': logger.info(f"INSTRUCTION TEMPLATE: {data['instruction_template']}")
logger.info(f"CONTEXT LENGTH (UPDATED): {shared.settings['truncation_length']}")
elif k == 'instruction_template':
logger.info(f"INSTRUCTION TEMPLATE (UPDATED): {shared.settings['instruction_template']}")
def list_loras(): def list_loras():

View file

@ -475,10 +475,8 @@ async def handle_list_models():
@app.post("/v1/internal/model/load", dependencies=check_admin_key) @app.post("/v1/internal/model/load", dependencies=check_admin_key)
async def handle_load_model(request_data: LoadModelRequest): async def handle_load_model(request_data: LoadModelRequest):
''' '''
This endpoint is experimental and may change in the future. The "args" parameter can be used to modify loader flags before loading
a model. Example:
The "args" parameter can be used to modify flags like "--load-in-4bit"
or "--n-gpu-layers" before loading a model. Example:
``` ```
"args": { "args": {
@ -487,18 +485,13 @@ async def handle_load_model(request_data: LoadModelRequest):
} }
``` ```
Note that those settings will remain after loading the model. So you Loader args are reset to their startup defaults between loads, so
may need to change them back to load a second model. settings from a previous load do not leak into the next one.
The "settings" parameter is also a dict but with keys for the The "instruction_template" parameter sets the default instruction
shared.settings object. It can be used to modify the default instruction template by name (from user_data/instruction-templates/). The
template like this: "instruction_template_str" parameter sets it as a raw Jinja2 string
and takes precedence over "instruction_template".
```
"settings": {
"instruction_template": "Alpaca"
}
```
''' '''
try: try:
@ -544,8 +537,8 @@ async def handle_unload_loras():
def find_available_port(starting_port): def find_available_port(starting_port):
"""Try the starting port, then find an available one if it's taken.""" """Try the starting port, then find an available one if it's taken."""
try: try:
# Try to create a socket with the starting port
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('', starting_port)) s.bind(('', starting_port))
return starting_port return starting_port
except OSError: except OSError:
@ -570,7 +563,7 @@ def run_server():
server_addrs.append(shared.args.listen_host) server_addrs.append(shared.args.listen_host)
else: else:
if os.environ.get('OPENEDAI_ENABLE_IPV6', shared.args.api_enable_ipv6): if os.environ.get('OPENEDAI_ENABLE_IPV6', shared.args.api_enable_ipv6):
server_addrs.append('[::]' if shared.args.listen else '[::1]') server_addrs.append('::' if shared.args.listen else '::1')
if not os.environ.get('OPENEDAI_DISABLE_IPV4', shared.args.api_disable_ipv4): if not os.environ.get('OPENEDAI_DISABLE_IPV4', shared.args.api_disable_ipv4):
server_addrs.append('0.0.0.0' if shared.args.listen else '127.0.0.1') server_addrs.append('0.0.0.0' if shared.args.listen else '127.0.0.1')
@ -587,7 +580,7 @@ def run_server():
) )
else: else:
url_proto = 'https://' if (ssl_certfile and ssl_keyfile) else 'http://' url_proto = 'https://' if (ssl_certfile and ssl_keyfile) else 'http://'
urls = [f'{url_proto}{addr}:{port}/v1' for addr in server_addrs] urls = [f'{url_proto}[{addr}]:{port}/v1' if ':' in addr else f'{url_proto}{addr}:{port}/v1' for addr in server_addrs]
if len(urls) > 1: if len(urls) > 1:
logger.info('OpenAI/Anthropic-compatible API URLs:\n\n' + '\n'.join(urls) + '\n') logger.info('OpenAI/Anthropic-compatible API URLs:\n\n' + '\n'.join(urls) + '\n')
else: else:

View file

@ -271,7 +271,8 @@ class ModelListResponse(BaseModel):
class LoadModelRequest(BaseModel): class LoadModelRequest(BaseModel):
model_name: str model_name: str
args: dict | None = None args: dict | None = None
settings: dict | None = None instruction_template: str | None = Field(default=None, description="An instruction template defined under text-generation-webui/user_data/instruction-templates. Sets the default template for all subsequent API requests.")
instruction_template_str: str | None = Field(default=None, description="A Jinja2 instruction template string. If set, takes precedence over instruction_template.")
class LoraListResponse(BaseModel): class LoraListResponse(BaseModel):

View file

@ -210,6 +210,57 @@ def _expand_tool_sequence(tool_seq):
return messages return messages
def _convert_to_tool_responses(messages):
"""Convert role:'tool' messages to tool_responses format.
Templates like Gemma 4 expect tool results as a ``tool_responses``
attribute on a message rather than separate ``role: 'tool'`` messages.
This function groups consecutive tool messages and rewrites them.
"""
result = []
tc_id_to_name = {}
i = 0
while i < len(messages):
msg = messages[i]
if msg.get('tool_calls'):
for tc in msg['tool_calls']:
tc_id = tc.get('id', '')
func_name = tc.get('function', {}).get('name', 'unknown')
if tc_id:
tc_id_to_name[tc_id] = func_name
if msg.get('role') == 'tool':
tool_responses = []
while i < len(messages) and messages[i].get('role') == 'tool':
tool_msg = messages[i]
tc_id = tool_msg.get('tool_call_id', '')
func_name = tc_id_to_name.get(tc_id, 'unknown')
content = tool_msg.get('content', '')
try:
response = json.loads(content)
except (json.JSONDecodeError, ValueError, TypeError):
response = content
tool_responses.append({
'name': func_name,
'response': response,
})
i += 1
result.append({
'role': 'tool',
'tool_responses': tool_responses,
})
else:
result.append(msg)
i += 1
return result
def _format_attachments(attachments, include_text=True): def _format_attachments(attachments, include_text=True):
"""Build image ref and text attachment strings from a list of attachments.""" """Build image ref and text attachment strings from a list of attachments."""
attachments_text = "" attachments_text = ""
@ -267,6 +318,9 @@ def generate_chat_prompt(user_input, state, **kwargs):
tools=state['tools'] if 'tools' in state else None, tools=state['tools'] if 'tools' in state else None,
) )
active_template_str = state['instruction_template_str'] if state['mode'] == 'instruct' else chat_template_str
uses_tool_responses = 'tool_responses' in active_template_str
messages = [] messages = []
if state['mode'] == 'instruct': if state['mode'] == 'instruct':
@ -503,6 +557,9 @@ def generate_chat_prompt(user_input, state, **kwargs):
return prompt return prompt
if uses_tool_responses:
messages = _convert_to_tool_responses(messages)
prompt = make_prompt(messages) prompt = make_prompt(messages)
# Handle truncation # Handle truncation
@ -511,13 +568,24 @@ def generate_chat_prompt(user_input, state, **kwargs):
encoded_length = get_encoded_length(prompt) encoded_length = get_encoded_length(prompt)
while len(messages) > 0 and encoded_length > max_length: while len(messages) > 0 and encoded_length > max_length:
# Remove old message, save system message
if len(messages) > 2 and messages[0]['role'] == 'system': if len(messages) > 2 and messages[0]['role'] == 'system':
messages.pop(1) pop_idx = 1
# Remove old message when no system message is present
elif len(messages) > 1 and messages[0]['role'] != 'system': elif len(messages) > 1 and messages[0]['role'] != 'system':
messages.pop(0) pop_idx = 0
else:
pop_idx = None
if pop_idx is not None:
messages.pop(pop_idx)
# Remove orphaned tool-call/tool-result messages that
# would be invalid without their partner.
while pop_idx < len(messages):
msg = messages[pop_idx]
if msg.get('role') == 'tool' or (msg.get('role') == 'assistant' and msg.get('tool_calls')):
messages.pop(pop_idx)
else:
break
# Resort to truncating the user input # Resort to truncating the user input
else: else:
@ -637,7 +705,7 @@ def get_stopping_strings(state):
# Find positions of each message content # Find positions of each message content
first_user_end = prompt.find("first user message") + len("first user message") first_user_end = prompt.find("first user message") + len("first user message")
first_assistant_start = prompt.find("first assistant message") first_assistant_start = prompt.find("first assistant message")
first_assistant_end = prompt.find("first assistant message") + len("first assistant message") first_assistant_end = first_assistant_start + len("first assistant message")
second_user_start = prompt.find("second user message") second_user_start = prompt.find("second user message")
second_assistant_end = prompt.find("second assistant message") + len("second assistant message") second_assistant_end = prompt.find("second assistant message") + len("second assistant message")
@ -1126,7 +1194,7 @@ def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_mess
# visible text from before buffering started so raw markup doesn't flash # visible text from before buffering started so raw markup doesn't flash
# in the UI. The internal text is left intact so the caller can still # in the UI. The internal text is left intact so the caller can still
# parse tool calls from it. # parse tool calls from it.
if is_stream and _check_tool_markers and streaming_tool_buffer_check(output['internal'][-1][1], markers=_streaming_markers, tool_names=_tool_names, check_bare_names=_check_bare_names): if is_stream and _check_tool_markers and streaming_tool_buffer_check(output['internal'][-1][1], markers=_streaming_markers, tool_names=_tool_names, check_bare_names=_check_bare_names, partial_match=False):
output['visible'][-1][1] = _last_visible_before_tool_buffer or '' output['visible'][-1][1] = _last_visible_before_tool_buffer or ''
yield output yield output
@ -1207,14 +1275,23 @@ def generate_chat_reply_wrapper(text, state, regenerate=False, _continue=False):
# Load tools if any are selected # Load tools if any are selected
selected = state.get('selected_tools', []) selected = state.get('selected_tools', [])
mcp_servers = state.get('mcp_servers', '')
parse_tool_call = None parse_tool_call = None
_tool_parsers = None _tool_parsers = None
if selected: if selected or mcp_servers:
from modules.tool_use import load_tools, execute_tool from modules.tool_use import load_tools, load_mcp_tools, execute_tool
from modules.tool_parsing import parse_tool_call, get_tool_call_id, detect_tool_call_format from modules.tool_parsing import parse_tool_call, get_tool_call_id, detect_tool_call_format
if selected:
tool_defs, tool_executors = load_tools(selected) tool_defs, tool_executors = load_tools(selected)
if mcp_servers:
mcp_defs, mcp_executors = load_mcp_tools(mcp_servers)
for td in mcp_defs:
fn = td['function']['name']
if fn in tool_executors:
logger.warning(f'MCP tool "{fn}" conflicts with a local tool. Skipping.')
continue
tool_defs.append(td)
tool_executors[fn] = mcp_executors[fn]
state['tools'] = tool_defs state['tools'] = tool_defs
tool_func_names = [t['function']['name'] for t in tool_defs] tool_func_names = [t['function']['name'] for t in tool_defs]
_template_str = state.get('instruction_template_str', '') if state.get('mode') == 'instruct' else state.get('chat_template_str', '') _template_str = state.get('instruction_template_str', '') if state.get('mode') == 'instruct' else state.get('chat_template_str', '')
@ -1762,7 +1839,8 @@ def load_history(unique_id, character, mode):
if not p.exists(): if not p.exists():
return {'internal': [], 'visible': [], 'metadata': {}} return {'internal': [], 'visible': [], 'metadata': {}}
f = json.loads(open(p, 'rb').read()) with open(p, 'rb') as fh:
f = json.loads(fh.read())
if 'internal' in f and 'visible' in f: if 'internal' in f and 'visible' in f:
history = f history = f
else: else:
@ -1826,19 +1904,17 @@ def generate_pfp_cache(character):
if not cache_folder.exists(): if not cache_folder.exists():
cache_folder.mkdir() cache_folder.mkdir()
for path in [shared.user_data_dir / 'characters' / f"{character}.{extension}" for extension in ['png', 'jpg', 'jpeg']]: for extension in ['png', 'jpg', 'jpeg']:
path = shared.user_data_dir / 'characters' / f"{character}.{extension}"
if path.exists(): if path.exists():
original_img = Image.open(path) original_img = Image.open(path)
# Define file paths pfp_path = cache_folder / 'pfp_character.png'
pfp_path = Path(f'{cache_folder}/pfp_character.png') thumb_path = cache_folder / 'pfp_character_thumb.png'
thumb_path = Path(f'{cache_folder}/pfp_character_thumb.png')
# Save main picture and thumbnail
original_img.save(pfp_path, format='PNG') original_img.save(pfp_path, format='PNG')
thumb = make_thumbnail(original_img) thumb = make_thumbnail(original_img)
thumb.save(thumb_path, format='PNG') thumb.save(thumb_path, format='PNG')
# Return the path to the thumbnail, not the in-memory PIL Image object.
return str(thumb_path) return str(thumb_path)
return None return None
@ -1859,13 +1935,13 @@ def load_character(character, name1, name2):
logger.error(f"Could not find the character \"{character}\" inside {shared.user_data_dir}/characters. No character has been loaded.") logger.error(f"Could not find the character \"{character}\" inside {shared.user_data_dir}/characters. No character has been loaded.")
raise ValueError raise ValueError
file_contents = open(filepath, 'r', encoding='utf-8').read() with open(filepath, 'r', encoding='utf-8') as fh:
file_contents = fh.read()
data = json.loads(file_contents) if extension == "json" else yaml.safe_load(file_contents) data = json.loads(file_contents) if extension == "json" else yaml.safe_load(file_contents)
cache_folder = Path(shared.args.disk_cache_dir) cache_folder = Path(shared.args.disk_cache_dir)
for path in [Path(f"{cache_folder}/pfp_character.png"), Path(f"{cache_folder}/pfp_character_thumb.png")]: for path in [cache_folder / "pfp_character.png", cache_folder / "pfp_character_thumb.png"]:
if path.exists(): path.unlink(missing_ok=True)
path.unlink()
picture = generate_pfp_cache(character) picture = generate_pfp_cache(character)
@ -1921,9 +1997,7 @@ def clear_character_for_ui(state):
# Clear the cache files # Clear the cache files
cache_folder = Path(shared.args.disk_cache_dir) cache_folder = Path(shared.args.disk_cache_dir)
for cache_file in ['pfp_character.png', 'pfp_character_thumb.png']: for cache_file in ['pfp_character.png', 'pfp_character_thumb.png']:
cache_path = Path(f'{cache_folder}/{cache_file}') (cache_folder / cache_file).unlink(missing_ok=True)
if cache_path.exists():
cache_path.unlink()
return state, state['name2'], state['context'], state['greeting'], None return state, state['name2'], state['context'], state['greeting'], None
@ -2018,11 +2092,10 @@ def upload_your_profile_picture(img_path):
cache_folder.mkdir() cache_folder.mkdir()
if img is None: if img is None:
if Path(f"{cache_folder}/pfp_me.png").exists(): (cache_folder / "pfp_me.png").unlink(missing_ok=True)
Path(f"{cache_folder}/pfp_me.png").unlink()
else: else:
img = make_thumbnail(img) img = make_thumbnail(img)
img.save(Path(f'{cache_folder}/pfp_me.png')) img.save(cache_folder / 'pfp_me.png')
logger.info(f'Profile picture saved to "{cache_folder}/pfp_me.png"') logger.info(f'Profile picture saved to "{cache_folder}/pfp_me.png"')
@ -2078,13 +2151,12 @@ def generate_user_pfp_cache(user):
if not cache_folder.exists(): if not cache_folder.exists():
cache_folder.mkdir() cache_folder.mkdir()
for path in [shared.user_data_dir / 'users' / f"{user}.{extension}" for extension in ['png', 'jpg', 'jpeg']]: for extension in ['png', 'jpg', 'jpeg']:
path = shared.user_data_dir / 'users' / f"{user}.{extension}"
if path.exists(): if path.exists():
original_img = Image.open(path) original_img = Image.open(path)
# Define file paths pfp_path = cache_folder / 'pfp_me.png'
pfp_path = Path(f'{cache_folder}/pfp_me.png')
# Save thumbnail
thumb = make_thumbnail(original_img) thumb = make_thumbnail(original_img)
thumb.save(pfp_path, format='PNG') thumb.save(pfp_path, format='PNG')
logger.info(f'User profile picture cached to "{pfp_path}"') logger.info(f'User profile picture cached to "{pfp_path}"')
@ -2116,9 +2188,7 @@ def load_user(user_name, name1, user_bio):
# Clear existing user picture cache # Clear existing user picture cache
cache_folder = Path(shared.args.disk_cache_dir) cache_folder = Path(shared.args.disk_cache_dir)
pfp_path = Path(f"{cache_folder}/pfp_me.png") (cache_folder / "pfp_me.png").unlink(missing_ok=True)
if pfp_path.exists():
pfp_path.unlink()
# Generate new picture cache # Generate new picture cache
picture = generate_user_pfp_cache(user_name) picture = generate_user_pfp_cache(user_name)
@ -2542,15 +2612,13 @@ def handle_character_picture_change(picture_path):
if picture is not None: if picture is not None:
# Save to cache # Save to cache
picture.save(Path(f'{cache_folder}/pfp_character.png'), format='PNG') picture.save(cache_folder / 'pfp_character.png', format='PNG')
thumb = make_thumbnail(picture) thumb = make_thumbnail(picture)
thumb.save(Path(f'{cache_folder}/pfp_character_thumb.png'), format='PNG') thumb.save(cache_folder / 'pfp_character_thumb.png', format='PNG')
else: else:
# Remove cache files when picture is cleared # Remove cache files when picture is cleared
for cache_file in ['pfp_character.png', 'pfp_character_thumb.png']: for cache_file in ['pfp_character.png', 'pfp_character_thumb.png']:
cache_path = Path(f'{cache_folder}/{cache_file}') (cache_folder / cache_file).unlink(missing_ok=True)
if cache_path.exists():
cache_path.unlink()
def handle_mode_change(state): def handle_mode_change(state):

View file

@ -14,6 +14,13 @@ from modules.reasoning import extract_reasoning
from modules.sane_markdown_lists import SaneListExtension from modules.sane_markdown_lists import SaneListExtension
from modules.utils import get_available_chat_styles from modules.utils import get_available_chat_styles
# Pre-compiled regex for protecting markdown-sensitive characters inside LaTeX.
# Covers $$...$$, \[...\], \(...\), and inline $...$ (when content contains \\).
_LATEX_PATTERN = re.compile(
r'((?:^|[\r\n\s])\$\$[^`]*?\$\$)|\\\[(.*?)\\\]|\\\((.*?)\\\)|(?<!\$)\$(?!\$)([^\$\n]*\\\\[^\$\n]*?)\$(?!\$)',
re.DOTALL
)
# This is to store the paths to the thumbnails of the profile pictures # This is to store the paths to the thumbnails of the profile pictures
image_cache = {} image_cache = {}
@ -185,28 +192,29 @@ def process_markdown_content(string):
if not string: if not string:
return "" return ""
# Define unique placeholders for LaTeX asterisks and underscores # Define unique placeholders for LaTeX characters that conflict with markdown
LATEX_ASTERISK_PLACEHOLDER = "LATEXASTERISKPLACEHOLDER" LATEX_ASTERISK_PLACEHOLDER = "LATEXASTERISKPLACEHOLDER"
LATEX_UNDERSCORE_PLACEHOLDER = "LATEXUNDERSCOREPLACEHOLDER" LATEX_UNDERSCORE_PLACEHOLDER = "LATEXUNDERSCOREPLACEHOLDER"
LATEX_PIPE_PLACEHOLDER = "LATEXPIPEPLACEHOLDER"
def protect_latex_content(content):
"""Protect markdown-sensitive characters inside LaTeX."""
content = content.replace('*', LATEX_ASTERISK_PLACEHOLDER)
content = content.replace('_', LATEX_UNDERSCORE_PLACEHOLDER)
content = content.replace('|', LATEX_PIPE_PLACEHOLDER)
return content
def protect_asterisks_underscores_in_latex(match): def protect_asterisks_underscores_in_latex(match):
"""A replacer function for re.sub to protect asterisks and underscores in multiple LaTeX formats.""" """A replacer function for re.sub to protect markdown-sensitive characters in multiple LaTeX formats."""
# Check which delimiter group was captured # Check which delimiter group was captured
if match.group(1) is not None: # Content from $$...$$ if match.group(1) is not None: # Content from $$...$$
content = match.group(1) return protect_latex_content(match.group(1))
modified_content = content.replace('*', LATEX_ASTERISK_PLACEHOLDER)
modified_content = modified_content.replace('_', LATEX_UNDERSCORE_PLACEHOLDER)
return f'{modified_content}'
elif match.group(2) is not None: # Content from \[...\] elif match.group(2) is not None: # Content from \[...\]
content = match.group(2) return f'\\[{protect_latex_content(match.group(2))}\\]'
modified_content = content.replace('*', LATEX_ASTERISK_PLACEHOLDER)
modified_content = modified_content.replace('_', LATEX_UNDERSCORE_PLACEHOLDER)
return f'\\[{modified_content}\\]'
elif match.group(3) is not None: # Content from \(...\) elif match.group(3) is not None: # Content from \(...\)
content = match.group(3) return f'\\({protect_latex_content(match.group(3))}\\)'
modified_content = content.replace('*', LATEX_ASTERISK_PLACEHOLDER) elif match.group(4) is not None: # Content from $...$
modified_content = modified_content.replace('_', LATEX_UNDERSCORE_PLACEHOLDER) return f'${protect_latex_content(match.group(4).strip())}$'
return f'\\({modified_content}\\)'
return match.group(0) # Fallback return match.group(0) # Fallback
@ -240,9 +248,7 @@ def process_markdown_content(string):
string = re.sub(r"(.)```", r"\1\n```", string) string = re.sub(r"(.)```", r"\1\n```", string)
# Protect asterisks and underscores within all LaTeX blocks before markdown conversion # Protect asterisks and underscores within all LaTeX blocks before markdown conversion
latex_pattern = re.compile(r'((?:^|[\r\n\s])\$\$[^`]*?\$\$)|\\\[(.*?)\\\]|\\\((.*?)\\\)', string = _LATEX_PATTERN.sub(protect_asterisks_underscores_in_latex, string)
re.DOTALL)
string = latex_pattern.sub(protect_asterisks_underscores_in_latex, string)
result = '' result = ''
is_code = False is_code = False
@ -306,6 +312,7 @@ def process_markdown_content(string):
# Restore the LaTeX asterisks and underscores after markdown conversion # Restore the LaTeX asterisks and underscores after markdown conversion
html_output = html_output.replace(LATEX_ASTERISK_PLACEHOLDER, '*') html_output = html_output.replace(LATEX_ASTERISK_PLACEHOLDER, '*')
html_output = html_output.replace(LATEX_UNDERSCORE_PLACEHOLDER, '_') html_output = html_output.replace(LATEX_UNDERSCORE_PLACEHOLDER, '_')
html_output = html_output.replace(LATEX_PIPE_PLACEHOLDER, '|')
# Remove extra newlines before </code> # Remove extra newlines before </code>
html_output = re.sub(r'\s*</code>', '</code>', html_output) html_output = re.sub(r'\s*</code>', '</code>', html_output)

View file

@ -10,72 +10,49 @@ def get_quantization_config(quant_method):
Get the appropriate quantization config based on the selected method. Get the appropriate quantization config based on the selected method.
Applies quantization to both the transformer and the text_encoder. Applies quantization to both the transformer and the text_encoder.
""" """
if quant_method == 'none' or not quant_method:
return None
import torch import torch
# Import BitsAndBytesConfig from BOTH libraries to be safe
from diffusers import BitsAndBytesConfig as DiffusersBnBConfig from diffusers import BitsAndBytesConfig as DiffusersBnBConfig
from diffusers import TorchAoConfig from diffusers import TorchAoConfig
from diffusers.quantizers import PipelineQuantizationConfig from diffusers.quantizers import PipelineQuantizationConfig
from transformers import BitsAndBytesConfig as TransformersBnBConfig from transformers import BitsAndBytesConfig as TransformersBnBConfig
if quant_method == 'none' or not quant_method: torchao_methods = {
return None 'torchao-int8wo': 'int8wo',
'torchao-fp4': 'fp4_e2m1',
'torchao-float8wo': 'float8wo',
}
# Bitsandbytes 8-bit quantization if quant_method == 'bnb-8bit':
elif quant_method == 'bnb-8bit':
return PipelineQuantizationConfig( return PipelineQuantizationConfig(
quant_mapping={ quant_mapping={
"transformer": DiffusersBnBConfig( "transformer": DiffusersBnBConfig(load_in_8bit=True),
load_in_8bit=True "text_encoder": TransformersBnBConfig(load_in_8bit=True)
),
"text_encoder": TransformersBnBConfig(
load_in_8bit=True
)
} }
) )
# Bitsandbytes 4-bit quantization
elif quant_method == 'bnb-4bit': elif quant_method == 'bnb-4bit':
bnb_4bit_kwargs = dict(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
return PipelineQuantizationConfig( return PipelineQuantizationConfig(
quant_mapping={ quant_mapping={
"transformer": DiffusersBnBConfig( "transformer": DiffusersBnBConfig(**bnb_4bit_kwargs),
load_in_4bit=True, "text_encoder": TransformersBnBConfig(**bnb_4bit_kwargs)
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
),
"text_encoder": TransformersBnBConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
} }
) )
# torchao int8 weight-only elif quant_method in torchao_methods:
elif quant_method == 'torchao-int8wo': ao_type = torchao_methods[quant_method]
return PipelineQuantizationConfig( return PipelineQuantizationConfig(
quant_mapping={ quant_mapping={
"transformer": TorchAoConfig("int8wo"), "transformer": TorchAoConfig(ao_type),
"text_encoder": TorchAoConfig("int8wo") "text_encoder": TorchAoConfig(ao_type)
}
)
# torchao fp4 (e2m1)
elif quant_method == 'torchao-fp4':
return PipelineQuantizationConfig(
quant_mapping={
"transformer": TorchAoConfig("fp4_e2m1"),
"text_encoder": TorchAoConfig("fp4_e2m1")
}
)
# torchao float8 weight-only
elif quant_method == 'torchao-float8wo':
return PipelineQuantizationConfig(
quant_mapping={
"transformer": TorchAoConfig("float8wo"),
"text_encoder": TorchAoConfig("float8wo")
} }
) )
@ -152,7 +129,7 @@ def load_image_model(model_name, dtype='bfloat16', attn_backend='sdpa', cpu_offl
modules = ["transformer", "unet"] modules = ["transformer", "unet"]
# Set attention backend # Set attention backend (diffusers defaults to native/SDPA)
if attn_backend == 'flash_attention_2': if attn_backend == 'flash_attention_2':
for name in modules: for name in modules:
mod = getattr(pipe, name, None) mod = getattr(pipe, name, None)

View file

@ -373,6 +373,7 @@ class LlamaServer:
"""Check if a port is available for use.""" """Check if a port is available for use."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
try: try:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('', port)) s.bind(('', port))
return True return True
except OSError: except OSError:

View file

@ -400,14 +400,19 @@ def load_instruction_template(template):
if template == 'None': if template == 'None':
return '' return ''
for filepath in [shared.user_data_dir / 'instruction-templates' / f'{template}.yaml', shared.user_data_dir / 'instruction-templates' / 'Alpaca.yaml']: for name in (template, 'Alpaca'):
if filepath.exists(): path = shared.user_data_dir / 'instruction-templates' / f'{name}.yaml'
break try:
with open(path, 'r', encoding='utf-8') as f:
file_contents = f.read()
except FileNotFoundError:
if name == template:
logger.warning(f"Instruction template '{template}' not found, falling back to Alpaca")
continue
break
else: else:
return '' return ''
with open(filepath, 'r', encoding='utf-8') as f:
file_contents = f.read()
data = yaml.safe_load(file_contents) data = yaml.safe_load(file_contents)
if 'instruction_template' in data: if 'instruction_template' in data:
return data['instruction_template'] return data['instruction_template']

View file

@ -1,6 +1,7 @@
from pathlib import Path from pathlib import Path
from modules import shared, utils from modules import shared, utils
from modules.utils import sanitize_filename
from modules.text_generation import get_encoded_length from modules.text_generation import get_encoded_length
@ -18,6 +19,7 @@ def load_prompt(fname):
return initial_content return initial_content
fname = sanitize_filename(fname)
file_path = shared.user_data_dir / 'logs' / 'notebook' / f'{fname}.txt' file_path = shared.user_data_dir / 'logs' / 'notebook' / f'{fname}.txt'
if file_path.exists(): if file_path.exists():
with open(file_path, 'r', encoding='utf-8') as f: with open(file_path, 'r', encoding='utf-8') as f:

View file

@ -7,6 +7,7 @@ THINKING_FORMATS = [
('<|channel|>analysis<|message|>', '<|end|>', '<|channel|>final<|message|>'), ('<|channel|>analysis<|message|>', '<|end|>', '<|channel|>final<|message|>'),
('<|channel|>commentary<|message|>', '<|end|>', '<|channel|>final<|message|>'), ('<|channel|>commentary<|message|>', '<|end|>', '<|channel|>final<|message|>'),
('<seed:think>', '</seed:think>', None), ('<seed:think>', '</seed:think>', None),
('<|channel>thought', '<channel|>', None), # Gemma 4
('<|think|>', '<|end|>', '<|content|>'), # Solar Open ('<|think|>', '<|end|>', '<|content|>'), # Solar Open
# ('Thinking Process:', '</think>', None), # Qwen3.5 verbose thinking outside tags -- removed: too prone to false positives in streaming # ('Thinking Process:', '</think>', None), # Qwen3.5 verbose thinking outside tags -- removed: too prone to false positives in streaming
(None, '</think>', None), # End-only variant (e.g., Qwen3-next) (None, '</think>', None), # End-only variant (e.g., Qwen3-next)
@ -72,9 +73,16 @@ def extract_reasoning(text, html_escaped=False):
if content_pos != -1: if content_pos != -1:
content_start = content_pos + len(content_esc) content_start = content_pos + len(content_esc)
else: else:
# Content tag not present — fall back to content after # Content tag not present yet. In GPT-OSS the region
# end_tag (e.g. GPT-OSS tool calls skip the final channel). # between <|end|> and the content tag contains internal
content_start = end_pos + len(end_esc) # markup (<|start|>assistant…) that must not be shown.
# Suppress it to prevent tag leaks during streaming.
remainder = text[end_pos + len(end_esc):].lstrip()
framing_token = esc('<|start|>')
if not remainder or remainder.startswith(framing_token) or framing_token.startswith(remainder):
content_start = len(text)
else:
content_start = end_pos + len(end_esc)
else: else:
content_start = end_pos + len(end_esc) content_start = end_pos + len(end_esc)

View file

@ -259,6 +259,7 @@ settings = {
'enable_web_search': False, 'enable_web_search': False,
'web_search_pages': 3, 'web_search_pages': 3,
'selected_tools': [], 'selected_tools': [],
'mcp_servers': '',
'prompt-notebook': '', 'prompt-notebook': '',
'preset': 'Top-P' if (user_data_dir / 'presets/Top-P.yaml').exists() else None, 'preset': 'Top-P' if (user_data_dir / 'presets/Top-P.yaml').exists() else None,
'max_new_tokens': 512, 'max_new_tokens': 512,
@ -363,7 +364,7 @@ settings = {
'image_llm_variations_prompt': 'Write a variation of the image generation prompt above. Consider the intent of the user with that prompt and write something that will likely please them, with added details. Output only the new prompt. Do not add any explanations, prefixes, or additional text.', 'image_llm_variations_prompt': 'Write a variation of the image generation prompt above. Consider the intent of the user with that prompt and write something that will likely please them, with added details. Output only the new prompt. Do not add any explanations, prefixes, or additional text.',
'image_model_menu': 'None', 'image_model_menu': 'None',
'image_dtype': 'bfloat16', 'image_dtype': 'bfloat16',
'image_attn_backend': 'flash_attention_2', 'image_attn_backend': 'sdpa',
'image_cpu_offload': False, 'image_cpu_offload': False,
'image_compile': False, 'image_compile': False,
'image_quant': 'none', 'image_quant': 'none',

View file

@ -27,10 +27,11 @@ TOOL_CALL_OPENING_MARKERS = [
'[TOOL_CALLS]', '[TOOL_CALLS]',
'to=functions.', 'to=functions.',
'<|channel|>commentary', '<|channel|>commentary',
'<|tool_call>call:',
] ]
def streaming_tool_buffer_check(text, markers=None, tool_names=None, check_bare_names=False): def streaming_tool_buffer_check(text, markers=None, tool_names=None, check_bare_names=False, partial_match=True):
''' '''
Check whether streaming output should be withheld because it may Check whether streaming output should be withheld because it may
contain tool-call markup. contain tool-call markup.
@ -42,6 +43,10 @@ def streaming_tool_buffer_check(text, markers=None, tool_names=None, check_bare_
tool_names: List of tool function names. tool_names: List of tool function names.
check_bare_names: Whether to do partial-prefix matching on tool check_bare_names: Whether to do partial-prefix matching on tool
names (for models with unknown template format). names (for models with unknown template format).
partial_match: Whether to check partial prefixes of markers/names.
Set to False for end-of-generation checks where a
partial prefix is just normal text, not an incomplete
tool call.
''' '''
# Strip thinking blocks so tool-call syntax inside <think> doesn't # Strip thinking blocks so tool-call syntax inside <think> doesn't
# trigger false positives. # trigger false positives.
@ -59,6 +64,9 @@ def streaming_tool_buffer_check(text, markers=None, tool_names=None, check_bare_
if name + '{' in text or name + ' {' in text: if name + '{' in text or name + ' {' in text:
return True return True
if not partial_match:
return False
# Partial-prefix matching: only for template-specific markers. # Partial-prefix matching: only for template-specific markers.
for marker in (markers if markers is not None else TOOL_CALL_OPENING_MARKERS): for marker in (markers if markers is not None else TOOL_CALL_OPENING_MARKERS):
for prefix_len in range(min(len(marker) - 1, len(text)), 0, -1): for prefix_len in range(min(len(marker) - 1, len(text)), 0, -1):
@ -400,6 +408,78 @@ def _parse_glm_tool_calls(answer: str, tool_names: list[str]):
return matches, start_pos return matches, start_pos
def _extract_gemma4_balanced(text, start):
"""Extract balanced braces from Gemma 4 format, using <|"|> as string delimiters."""
if start >= len(text) or text[start] != '{':
return None
depth = 0
in_string = False
quote_token = '<|"|>'
quote_len = len(quote_token)
i = start
while i < len(text):
if text[i:i + quote_len] == quote_token:
in_string = not in_string
i += quote_len
continue
if in_string:
i += 1
continue
c = text[i]
if c == '{':
depth += 1
elif c == '}':
depth -= 1
if depth == 0:
return text[start:i + 1]
i += 1
return None
def _parse_gemma4_tool_calls(answer: str, tool_names: list[str]):
"""Parse Gemma 4-style tool calls.
Format:
<|tool_call>call:func_name{key:<|"|>value<|"|>,...}<tool_call|>
Values use <|"|> tokens instead of standard JSON quotes, and keys are
bare identifiers.
"""
matches = []
start_pos = None
for m in re.finditer(r'<\|tool_call>call:([^\s{]+)\s*', answer):
func_name = m.group(1).strip()
if func_name not in tool_names:
continue
brace_start = m.end()
if brace_start >= len(answer) or answer[brace_start] != '{':
continue
content = _extract_gemma4_balanced(answer, brace_start)
if content is None:
continue
# Convert to JSON: split on <|"|> tokens so that key quoting
# only applies outside string values (even-indexed parts),
# then rejoin with real quotes.
parts = content.split('<|"|>')
for idx in range(0, len(parts), 2):
parts[idx] = re.sub(r'(^|[{,\[])\s*(\w+)\s*:', r'\1"\2":', parts[idx])
json_str = '"'.join(parts)
try:
arguments = json.loads(json_str)
if start_pos is None:
start_pos = m.start()
matches.append(_make_tool_call(func_name, arguments))
except (json.JSONDecodeError, ValueError):
pass
return matches, start_pos
def _parse_pythonic_tool_calls(answer: str, tool_names: list[str]): def _parse_pythonic_tool_calls(answer: str, tool_names: list[str]):
"""Parse pythonic-style tool calls used by Llama 4 and similar models. """Parse pythonic-style tool calls used by Llama 4 and similar models.
@ -472,6 +552,11 @@ TOOL_CALL_FORMATS = [
'parser': _parse_channel_tool_calls, 'parser': _parse_channel_tool_calls,
'markers': ['to=functions.', '<|channel|>commentary'], 'markers': ['to=functions.', '<|channel|>commentary'],
}, },
{
'template_hints': ['<|tool_call>call:'],
'parser': _parse_gemma4_tool_calls,
'markers': ['<|tool_call>call:'],
},
{ {
'template_hints': ['minimax:tool_call'], 'template_hints': ['minimax:tool_call'],
'parser': _parse_minimax_tool_calls, 'parser': _parse_minimax_tool_calls,
@ -504,6 +589,7 @@ ALL_PARSERS = [
_parse_deep_seek_tool_calls, _parse_deep_seek_tool_calls,
_parse_kimi_tool_calls, _parse_kimi_tool_calls,
_parse_channel_tool_calls, _parse_channel_tool_calls,
_parse_gemma4_tool_calls,
_parse_minimax_tool_calls, _parse_minimax_tool_calls,
_parse_glm_tool_calls, _parse_glm_tool_calls,
_parse_xml_param_tool_calls, _parse_xml_param_tool_calls,
@ -552,9 +638,15 @@ def parse_tool_call(answer: str, tool_names: list[str], return_prefix: bool = Fa
# Strip thinking blocks so tool-call syntax inside <think> is ignored. # Strip thinking blocks so tool-call syntax inside <think> is ignored.
original_answer = answer original_answer = answer
_, answer = extract_reasoning(answer) _, answer = extract_reasoning(answer)
# Offset between original and stripped text, used to map start_pos # Reasoning extraction returns empty content when GPT-OSS internal
# back to the original string when returning a prefix. # markup (<|start|>assistant…) follows the thinking block without a
reasoning_offset = len(original_answer) - len(answer) # content tag. Fall back to the full text so tool-call markers can
# be found.
if not answer.strip():
answer = original_answer
reasoning_offset = 0
else:
reasoning_offset = len(original_answer) - len(answer)
matches = [] matches = []
start_pos = None start_pos = None
@ -620,6 +712,8 @@ def parse_tool_call(answer: str, tool_names: list[str], return_prefix: bool = Fa
if not isinstance(candidates, list): if not isinstance(candidates, list):
candidates = [candidates] candidates = [candidates]
for candidate_dict in candidates: for candidate_dict in candidates:
if not isinstance(candidate_dict, dict):
continue
checked_candidate = check_and_sanitize_tool_call_candidate(candidate_dict, tool_names) checked_candidate = check_and_sanitize_tool_call_candidate(candidate_dict, tool_names)
if checked_candidate is not None: if checked_candidate is not None:
matches.append(checked_candidate) matches.append(checked_candidate)

View file

@ -1,3 +1,4 @@
import asyncio
import importlib.util import importlib.util
import json import json
@ -55,6 +56,119 @@ def load_tools(selected_names):
return tool_defs, executors return tool_defs, executors
def _parse_mcp_servers(servers_str):
"""Parse MCP servers textbox: one server per line, format 'url' or 'url,Header: value,Header2: value2'."""
servers = []
for line in servers_str.strip().splitlines():
line = line.strip()
if not line:
continue
parts = line.split(',')
url = parts[0].strip()
headers = {}
for part in parts[1:]:
part = part.strip()
if ':' in part:
key, val = part.split(':', 1)
headers[key.strip()] = val.strip()
servers.append((url, headers))
return servers
def _mcp_tool_to_openai(tool):
"""Convert an MCP Tool object to OpenAI-format tool dict."""
return {
"type": "function",
"function": {
"name": tool.name,
"description": tool.description or "",
"parameters": tool.inputSchema or {"type": "object", "properties": {}}
}
}
async def _mcp_session(url, headers, callback):
"""Open an MCP session and pass it to the callback."""
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession
async with streamablehttp_client(url, headers=headers or None) as (read_stream, write_stream, _):
async with ClientSession(read_stream, write_stream) as session:
await session.initialize()
return await callback(session)
def _make_mcp_executor(name, url, headers):
def executor(arguments):
return asyncio.run(_call_mcp_tool(name, arguments, url, headers))
return executor
async def _connect_mcp_server(url, headers):
"""Connect to one MCP server and return (tool_defs, executors)."""
async def _discover(session):
result = await session.list_tools()
tool_defs = []
executors = {}
for tool in result.tools:
tool_defs.append(_mcp_tool_to_openai(tool))
executors[tool.name] = _make_mcp_executor(tool.name, url, headers)
return tool_defs, executors
return await _mcp_session(url, headers, _discover)
async def _call_mcp_tool(name, arguments, url, headers):
"""Connect to an MCP server and call a single tool."""
async def _invoke(session):
result = await session.call_tool(name, arguments)
parts = []
for content in result.content:
if hasattr(content, 'text'):
parts.append(content.text)
else:
parts.append(str(content))
return '\n'.join(parts) if parts else ''
return await _mcp_session(url, headers, _invoke)
async def _connect_all_mcp_servers(servers):
"""Connect to all MCP servers concurrently."""
results = await asyncio.gather(
*(_connect_mcp_server(url, headers) for url, headers in servers),
return_exceptions=True
)
all_defs = []
all_executors = {}
for (url, _), result in zip(servers, results):
if isinstance(result, Exception):
logger.exception(f'Failed to connect to MCP server "{url}"', exc_info=result)
continue
defs, execs = result
for td, (fn, ex) in zip(defs, execs.items()):
if fn in all_executors:
logger.warning(f'MCP tool "{fn}" from {url} conflicts with an already loaded tool. Skipping.')
continue
all_defs.append(td)
all_executors[fn] = ex
return all_defs, all_executors
def load_mcp_tools(servers_str):
"""
Parse MCP servers string and discover tools from each server.
Returns (tool_defs, executors) in the same format as load_tools.
"""
servers = _parse_mcp_servers(servers_str)
if not servers:
return [], {}
return asyncio.run(_connect_all_mcp_servers(servers))
def execute_tool(func_name, arguments, executors): def execute_tool(func_name, arguments, executors):
"""Execute a tool by function name. Returns result as a JSON string.""" """Execute a tool by function name. Returns result as a JSON string."""
fn = executors.get(func_name) fn = executors.get(func_name)

View file

@ -52,7 +52,7 @@ def create_ui():
with gr.Column(): with gr.Column():
always_override = gr.Checkbox(label='Override Existing Files', value=False, info='If the name is the same, checking will replace the existing file, and unchecking will load and continue from it (the rank must be the same).', elem_classes=['no-background']) always_override = gr.Checkbox(label='Override Existing Files', value=False, info='If the name is the same, checking will replace the existing file, and unchecking will load and continue from it (the rank must be the same).', elem_classes=['no-background'])
with gr.Accordion(label='Target Modules', open=False, elem_classes='tgw-accordion'): with gr.Accordion(label='Target Modules', open=False):
gr.Markdown("Selects which modules to target in training. Targeting more modules is closer to a full fine-tune at the cost of increased VRAM and adapter size.") gr.Markdown("Selects which modules to target in training. Targeting more modules is closer to a full fine-tune at the cost of increased VRAM and adapter size.")
all_linear = gr.Checkbox(label='Target all linear layers', value=True, info='Targets every nn.Linear layer except lm_head. Works for any model architecture. When checked, the individual module checkboxes below are ignored.', elem_classes=['no-background']) all_linear = gr.Checkbox(label='Target all linear layers', value=True, info='Targets every nn.Linear layer except lm_head. Works for any model architecture. When checked, the individual module checkboxes below are ignored.', elem_classes=['no-background'])
with gr.Row(): with gr.Row():
@ -87,7 +87,7 @@ def create_ui():
with gr.Row(): with gr.Row():
lr_scheduler_type = gr.Dropdown(label='LR Scheduler', value='cosine', choices=['linear', 'constant', 'constant_with_warmup', 'cosine', 'cosine_with_restarts', 'polynomial', 'inverse_sqrt'], info='Learning rate scheduler - defines how the learning rate changes over time. "Constant" means never change, "linear" means to go in a straight line from the learning rate down to 0, cosine follows a curve, etc.', elem_classes=['slim-dropdown']) lr_scheduler_type = gr.Dropdown(label='LR Scheduler', value='cosine', choices=['linear', 'constant', 'constant_with_warmup', 'cosine', 'cosine_with_restarts', 'polynomial', 'inverse_sqrt'], info='Learning rate scheduler - defines how the learning rate changes over time. "Constant" means never change, "linear" means to go in a straight line from the learning rate down to 0, cosine follows a curve, etc.', elem_classes=['slim-dropdown'])
with gr.Accordion(label='Advanced Options', open=False, elem_classes='tgw-accordion'): with gr.Accordion(label='Advanced Options', open=False):
with gr.Row(): with gr.Row():
with gr.Column(): with gr.Column():
optimizer = gr.Dropdown(label='Optimizer', value='adamw_torch', choices=['adamw_hf', 'adamw_torch', 'adamw_torch_fused', 'adamw_torch_xla', 'adamw_apex_fused', 'adafactor', 'adamw_bnb_8bit', 'adamw_anyprecision', 'sgd', 'adagrad'], info='Optimizer algorithm. adamw_torch is the standard choice. adamw_bnb_8bit uses less VRAM. adafactor is memory-efficient for large models.', elem_classes=['slim-dropdown']) optimizer = gr.Dropdown(label='Optimizer', value='adamw_torch', choices=['adamw_hf', 'adamw_torch', 'adamw_torch_fused', 'adamw_torch_xla', 'adamw_apex_fused', 'adafactor', 'adamw_bnb_8bit', 'adamw_anyprecision', 'sgd', 'adagrad'], info='Optimizer algorithm. adamw_torch is the standard choice. adamw_bnb_8bit uses less VRAM. adafactor is memory-efficient for large models.', elem_classes=['slim-dropdown'])

View file

@ -75,7 +75,7 @@ if not shared.args.old_colors:
background_fill_primary_dark='var(--darker-gray, #1C1C1D)', background_fill_primary_dark='var(--darker-gray, #1C1C1D)',
body_background_fill="white", body_background_fill="white",
block_background_fill="transparent", block_background_fill="transparent",
body_text_color='rgb(64, 64, 64)', body_text_color='#1a1a1a',
button_secondary_background_fill="white", button_secondary_background_fill="white",
button_secondary_border_color="var(--border-color-primary)", button_secondary_border_color="var(--border-color-primary)",
block_title_text_color='*body_text_color', block_title_text_color='*body_text_color',
@ -209,6 +209,7 @@ def list_interface_input_elements():
'textbox', 'textbox',
'start_with', 'start_with',
'selected_tools', 'selected_tools',
'mcp_servers',
'mode', 'mode',
'chat_style', 'chat_style',
'chat-instruct_command', 'chat-instruct_command',
@ -434,6 +435,7 @@ def setup_auto_save():
'custom_system_message', 'custom_system_message',
'chat_template_str', 'chat_template_str',
'selected_tools', 'selected_tools',
'mcp_servers',
# Parameters tab (ui_parameters.py) - Generation parameters # Parameters tab (ui_parameters.py) - Generation parameters
'preset_menu', 'preset_menu',

View file

@ -52,7 +52,7 @@ def create_ui():
shared.gradio['html_display'] = gr.HTML(value=chat_html_wrapper({'internal': [], 'visible': [], 'metadata': {}}, '', '', 'chat', 'cai-chat', '')['html'], visible=True) shared.gradio['html_display'] = gr.HTML(value=chat_html_wrapper({'internal': [], 'visible': [], 'metadata': {}}, '', '', 'chat', 'cai-chat', '')['html'], visible=True)
with gr.Row(elem_id="chat-input-row"): with gr.Row(elem_id="chat-input-row"):
with gr.Column(scale=1, elem_id='gr-hover-container'): with gr.Column(scale=1, elem_id='gr-hover-container'):
gr.HTML(value='<div class="hover-element" onclick="void(0)"><span style="width: 100px; display: block" id="hover-element-button">&#9776;</span><div class="hover-menu" id="hover-menu"></div>', elem_id='gr-hover') gr.HTML(value='<div class="hover-element" onclick="void(0)"><span id="hover-element-button"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><line x1="4" y1="6" x2="20" y2="6"></line><line x1="4" y1="12" x2="20" y2="12"></line><line x1="4" y1="18" x2="20" y2="18"></line></svg></span><div class="hover-menu" id="hover-menu"></div></div>', elem_id='gr-hover')
with gr.Column(scale=10, elem_id='chat-input-container'): with gr.Column(scale=10, elem_id='chat-input-container'):
shared.gradio['textbox'] = gr.MultimodalTextbox(label='', placeholder='Send a message', file_types=['text', '.pdf', 'image'], file_count="multiple", elem_id='chat-input', elem_classes=['add_scrollbar']) shared.gradio['textbox'] = gr.MultimodalTextbox(label='', placeholder='Send a message', file_types=['text', '.pdf', 'image'], file_count="multiple", elem_id='chat-input', elem_classes=['add_scrollbar'])
@ -105,6 +105,9 @@ def create_ui():
shared.gradio['selected_tools'].change(fn=sync_web_tools, inputs=[shared.gradio['selected_tools']], outputs=[shared.gradio['selected_tools']], show_progress=False) shared.gradio['selected_tools'].change(fn=sync_web_tools, inputs=[shared.gradio['selected_tools']], outputs=[shared.gradio['selected_tools']], show_progress=False)
with gr.Accordion('MCP servers', open=False):
shared.gradio['mcp_servers'] = gr.Textbox(value=shared.settings.get('mcp_servers', ''), lines=3, max_lines=3, label='', info='One url per line. For headers, write url,Header: value,Header2: value2', elem_classes=['add_scrollbar'])
gr.HTML("<div class='sidebar-vertical-separator'></div>") gr.HTML("<div class='sidebar-vertical-separator'></div>")
with gr.Row(): with gr.Row():

View file

@ -10,7 +10,7 @@ from modules.text_generation import (
stop_everything_event stop_everything_event
) )
from modules.ui_notebook import store_notebook_state_and_debounce from modules.ui_notebook import store_notebook_state_and_debounce
from modules.utils import gradio from modules.utils import gradio, sanitize_filename
inputs = ('textbox-default', 'interface_state') inputs = ('textbox-default', 'interface_state')
outputs = ('output_textbox', 'html-default') outputs = ('output_textbox', 'html-default')
@ -167,6 +167,7 @@ def handle_new_prompt():
def handle_delete_prompt_confirm_default(prompt_name): def handle_delete_prompt_confirm_default(prompt_name):
prompt_name = sanitize_filename(prompt_name)
available_prompts = utils.get_available_prompts() available_prompts = utils.get_available_prompts()
current_index = available_prompts.index(prompt_name) if prompt_name in available_prompts else 0 current_index = available_prompts.index(prompt_name) if prompt_name in available_prompts else 0
@ -199,6 +200,8 @@ def handle_rename_prompt_click_default(current_name):
def handle_rename_prompt_confirm_default(new_name, current_name): def handle_rename_prompt_confirm_default(new_name, current_name):
new_name = sanitize_filename(new_name)
current_name = sanitize_filename(current_name)
old_path = shared.user_data_dir / "logs" / "notebook" / f"{current_name}.txt" old_path = shared.user_data_dir / "logs" / "notebook" / f"{current_name}.txt"
new_path = shared.user_data_dir / "logs" / "notebook" / f"{new_name}.txt" new_path = shared.user_data_dir / "logs" / "notebook" / f"{new_name}.txt"

View file

@ -798,6 +798,9 @@ def generate(state, save_images=True):
if seed == -1: if seed == -1:
seed = random.randint(0, 2**32 - 1) seed = random.randint(0, 2**32 - 1)
# Store resolved seed back so callers (e.g. API) can access it
state['image_seed_resolved'] = seed
device = get_device() device = get_device()
if device is None: if device is None:
device = "cpu" device = "cpu"

View file

@ -54,7 +54,6 @@ def create_ui():
if not shared.args.portable: if not shared.args.portable:
shared.gradio['ik'] = gr.Checkbox(label="ik", value=shared.args.ik, info='Use ik_llama.cpp instead of upstream llama.cpp.') shared.gradio['ik'] = gr.Checkbox(label="ik", value=shared.args.ik, info='Use ik_llama.cpp instead of upstream llama.cpp.')
shared.gradio['cpu_moe'] = gr.Checkbox(label="cpu-moe", value=shared.args.cpu_moe, info='Move the experts to the CPU. Saves VRAM on MoE models.')
shared.gradio['streaming_llm'] = gr.Checkbox(label="streaming-llm", value=shared.args.streaming_llm, info='Activate StreamingLLM to avoid re-evaluating the entire prompt when old messages are removed.') shared.gradio['streaming_llm'] = gr.Checkbox(label="streaming-llm", value=shared.args.streaming_llm, info='Activate StreamingLLM to avoid re-evaluating the entire prompt when old messages are removed.')
shared.gradio['load_in_8bit'] = gr.Checkbox(label="load-in-8bit", value=shared.args.load_in_8bit) shared.gradio['load_in_8bit'] = gr.Checkbox(label="load-in-8bit", value=shared.args.load_in_8bit)
shared.gradio['load_in_4bit'] = gr.Checkbox(label="load-in-4bit", value=shared.args.load_in_4bit) shared.gradio['load_in_4bit'] = gr.Checkbox(label="load-in-4bit", value=shared.args.load_in_4bit)
@ -67,13 +66,13 @@ def create_ui():
) )
# Multimodal # Multimodal
with gr.Accordion("Multimodal (vision)", open=False, elem_classes='tgw-accordion') as shared.gradio['mmproj_accordion']: with gr.Accordion("Multimodal (vision)", open=False) as shared.gradio['mmproj_accordion']:
with gr.Row(): with gr.Row():
shared.gradio['mmproj'] = gr.Dropdown(label="mmproj file", choices=utils.get_available_mmproj(), value=lambda: shared.args.mmproj or 'None', elem_classes='slim-dropdown', info=f'Select a file that matches your model. Must be placed in {shared.user_data_dir}/mmproj/', interactive=not mu) shared.gradio['mmproj'] = gr.Dropdown(label="mmproj file", choices=utils.get_available_mmproj(), value=lambda: shared.args.mmproj or 'None', elem_classes='slim-dropdown', info=f'Select a file that matches your model. Must be placed in {shared.user_data_dir}/mmproj/', interactive=not mu)
ui.create_refresh_button(shared.gradio['mmproj'], lambda: None, lambda: {'choices': utils.get_available_mmproj()}, 'refresh-button', interactive=not mu) ui.create_refresh_button(shared.gradio['mmproj'], lambda: None, lambda: {'choices': utils.get_available_mmproj()}, 'refresh-button', interactive=not mu)
# Speculative decoding # Speculative decoding
with gr.Accordion("Speculative decoding", open=False, elem_classes='tgw-accordion') as shared.gradio['speculative_decoding_accordion']: with gr.Accordion("Speculative decoding", open=False) as shared.gradio['speculative_decoding_accordion']:
shared.gradio['draft_max'] = gr.Number(label="draft-max", precision=0, step=1, value=shared.args.draft_max, info='Maximum number of tokens to draft for speculative decoding. Recommended: 4 for draft model, 64 for n-gram.') shared.gradio['draft_max'] = gr.Number(label="draft-max", precision=0, step=1, value=shared.args.draft_max, info='Maximum number of tokens to draft for speculative decoding. Recommended: 4 for draft model, 64 for n-gram.')
gr.Markdown('#### Draft model') gr.Markdown('#### Draft model')
@ -92,7 +91,7 @@ def create_ui():
shared.gradio['spec_ngram_min_hits'] = gr.Number(label="spec-ngram-min-hits", precision=0, step=1, value=shared.args.spec_ngram_min_hits, info='Minimum n-gram hits for ngram-map speculative decoding.', visible=shared.args.spec_type != 'none') shared.gradio['spec_ngram_min_hits'] = gr.Number(label="spec-ngram-min-hits", precision=0, step=1, value=shared.args.spec_ngram_min_hits, info='Minimum n-gram hits for ngram-map speculative decoding.', visible=shared.args.spec_type != 'none')
gr.Markdown("## Other options") gr.Markdown("## Other options")
with gr.Accordion("See more options", open=False, elem_classes='tgw-accordion'): with gr.Accordion("See more options", open=False):
with gr.Row(): with gr.Row():
with gr.Column(): with gr.Column():
shared.gradio['parallel'] = gr.Slider(label="parallel", minimum=1, step=1, maximum=64, value=shared.args.parallel, info='Number of parallel request slots for the API. The context size is divided equally among slots. For example, to have 4 slots with 8192 context each, set ctx_size to 32768.') shared.gradio['parallel'] = gr.Slider(label="parallel", minimum=1, step=1, maximum=64, value=shared.args.parallel, info='Number of parallel request slots for the API. The context size is divided equally among slots. For example, to have 4 slots with 8192 context each, set ctx_size to 32768.')
@ -109,6 +108,7 @@ def create_ui():
with gr.Column(): with gr.Column():
shared.gradio['cpu'] = gr.Checkbox(label="cpu", value=shared.args.cpu, info='Use PyTorch in CPU mode.') shared.gradio['cpu'] = gr.Checkbox(label="cpu", value=shared.args.cpu, info='Use PyTorch in CPU mode.')
shared.gradio['disk'] = gr.Checkbox(label="disk", value=shared.args.disk) shared.gradio['disk'] = gr.Checkbox(label="disk", value=shared.args.disk)
shared.gradio['cpu_moe'] = gr.Checkbox(label="cpu-moe", value=shared.args.cpu_moe, info='Move the experts to the CPU. Saves VRAM on MoE models.')
shared.gradio['row_split'] = gr.Checkbox(label="row_split", value=shared.args.row_split, info='Split the model by rows across GPUs. This may improve multi-gpu performance.') shared.gradio['row_split'] = gr.Checkbox(label="row_split", value=shared.args.row_split, info='Split the model by rows across GPUs. This may improve multi-gpu performance.')
shared.gradio['no_kv_offload'] = gr.Checkbox(label="no_kv_offload", value=shared.args.no_kv_offload, info='Do not offload the K, Q, V to the GPU. This saves VRAM but reduces performance.') shared.gradio['no_kv_offload'] = gr.Checkbox(label="no_kv_offload", value=shared.args.no_kv_offload, info='Do not offload the K, Q, V to the GPU. This saves VRAM but reduces performance.')
shared.gradio['no_mmap'] = gr.Checkbox(label="no-mmap", value=shared.args.no_mmap) shared.gradio['no_mmap'] = gr.Checkbox(label="no-mmap", value=shared.args.no_mmap)

View file

@ -11,7 +11,7 @@ from modules.text_generation import (
get_token_ids, get_token_ids,
stop_everything_event stop_everything_event
) )
from modules.utils import gradio from modules.utils import gradio, sanitize_filename
_notebook_file_lock = threading.Lock() _notebook_file_lock = threading.Lock()
_notebook_auto_save_timer = None _notebook_auto_save_timer = None
@ -202,6 +202,7 @@ def handle_new_prompt():
def handle_delete_prompt_confirm_notebook(prompt_name): def handle_delete_prompt_confirm_notebook(prompt_name):
prompt_name = sanitize_filename(prompt_name)
available_prompts = utils.get_available_prompts() available_prompts = utils.get_available_prompts()
current_index = available_prompts.index(prompt_name) if prompt_name in available_prompts else 0 current_index = available_prompts.index(prompt_name) if prompt_name in available_prompts else 0
@ -233,6 +234,8 @@ def handle_rename_prompt_click_notebook(current_name):
def handle_rename_prompt_confirm_notebook(new_name, current_name): def handle_rename_prompt_confirm_notebook(new_name, current_name):
new_name = sanitize_filename(new_name)
current_name = sanitize_filename(current_name)
old_path = shared.user_data_dir / "logs" / "notebook" / f"{current_name}.txt" old_path = shared.user_data_dir / "logs" / "notebook" / f"{current_name}.txt"
new_path = shared.user_data_dir / "logs" / "notebook" / f"{new_name}.txt" new_path = shared.user_data_dir / "logs" / "notebook" / f"{new_name}.txt"
@ -249,6 +252,7 @@ def handle_rename_prompt_confirm_notebook(new_name, current_name):
def autosave_prompt(text, prompt_name): def autosave_prompt(text, prompt_name):
"""Automatically save the text to the selected prompt file""" """Automatically save the text to the selected prompt file"""
prompt_name = sanitize_filename(prompt_name)
if prompt_name and text.strip(): if prompt_name and text.strip():
prompt_path = shared.user_data_dir / "logs" / "notebook" / f"{prompt_name}.txt" prompt_path = shared.user_data_dir / "logs" / "notebook" / f"{prompt_name}.txt"
prompt_path.parent.mkdir(parents=True, exist_ok=True) prompt_path.parent.mkdir(parents=True, exist_ok=True)

View file

@ -105,6 +105,9 @@ def resolve_model_path(model_name_or_path, image_model=False):
before the default models directory. before the default models directory.
""" """
if model_name_or_path is None:
raise FileNotFoundError("No model specified.")
path_candidate = Path(model_name_or_path) path_candidate = Path(model_name_or_path)
if path_candidate.exists(): if path_candidate.exists():
return path_candidate return path_candidate

View file

@ -9,6 +9,7 @@ flash-linear-attention==0.4.*
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pandas pandas
peft==0.18.* peft==0.18.*
@ -31,8 +32,8 @@ tqdm
wandb wandb
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -40,10 +41,10 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav3/releases/download/v0.0.28/exllamav3-0.0.28+cu128.torch2.9.0-cp313-cp313-win_amd64.whl; platform_system == "Windows" and python_version == "3.13" https://github.com/turboderp-org/exllamav3/releases/download/v0.0.28/exllamav3-0.0.28+cu128.torch2.9.0-cp313-cp313-win_amd64.whl; platform_system == "Windows" and python_version == "3.13"
https://github.com/turboderp-org/exllamav3/releases/download/v0.0.28/exllamav3-0.0.28+cu128.torch2.9.0-cp313-cp313-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.13" https://github.com/turboderp-org/exllamav3/releases/download/v0.0.28/exllamav3-0.0.28+cu128.torch2.9.0-cp313-cp313-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.13"
https://github.com/kingbri1/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu128torch2.9.0cxx11abiFALSE-cp313-cp313-win_amd64.whl; platform_system == "Windows" and python_version == "3.13" https://github.com/kingbri1/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu128torch2.9.0cxx11abiFALSE-cp313-cp313-win_amd64.whl; platform_system == "Windows" and python_version == "3.13"

View file

@ -7,6 +7,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pandas pandas
peft==0.18.* peft==0.18.*
@ -28,8 +29,8 @@ trafilatura==2.0.0
wandb wandb
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -37,5 +38,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# AMD wheels # AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+rocm7.2-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+rocm7.2-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+rocm7.2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+rocm7.2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -7,6 +7,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pandas pandas
peft==0.18.* peft==0.18.*
@ -28,8 +29,8 @@ trafilatura==2.0.0
wandb wandb
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -37,4 +38,4 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0-py3-none-macosx_13_0_x86_64.whl; platform_system == "Darwin" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0-py3-none-macosx_13_0_x86_64.whl; platform_system == "Darwin"

View file

@ -7,6 +7,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pandas pandas
peft==0.18.* peft==0.18.*
@ -28,8 +29,8 @@ trafilatura==2.0.0
wandb wandb
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -37,4 +38,4 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin"

View file

@ -7,6 +7,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pandas pandas
peft==0.18.* peft==0.18.*
@ -28,8 +29,8 @@ trafilatura==2.0.0
wandb wandb
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -37,7 +38,7 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama.cpp (CPU only) # llama.cpp (CPU only)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cpu-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cpu-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cpu-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cpu-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cpu-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cpu-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cpu-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cpu-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -7,6 +7,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pandas pandas
peft==0.18.* peft==0.18.*
@ -28,8 +29,8 @@ trafilatura==2.0.0
wandb wandb
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,5 +24,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,5 +24,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# AMD wheels # AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+rocm7.2-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+rocm7.2-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+rocm7.2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+rocm7.2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,4 +24,4 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0-py3-none-macosx_13_0_x86_64.whl; platform_system == "Darwin" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0-py3-none-macosx_13_0_x86_64.whl; platform_system == "Darwin"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,4 +24,4 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,5 +24,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama.cpp (CPU only) # llama.cpp (CPU only)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cpu-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cpu-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cpu-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cpu-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,5 +24,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cu131-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cu131-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+cu131-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+cu131-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,5 +24,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,5 +24,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# ik_llama.cpp (CPU only) # ik_llama.cpp (CPU only)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cpu-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cpu-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cpu-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cpu-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,5 +24,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cu131-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cu131-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/ik_llama_cpp_binaries-0.102.0+cu131-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/ik_llama_cpp_binaries-0.110.0+cu131-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15

View file

@ -3,6 +3,7 @@ fastapi==0.112.4
huggingface-hub==1.5.* huggingface-hub==1.5.*
jinja2==3.1.6 jinja2==3.1.6
markdown markdown
mcp==1.27.0
numpy==2.2.* numpy==2.2.*
pydantic==2.11.0 pydantic==2.11.0
pymupdf==1.27.* pymupdf==1.27.*
@ -14,8 +15,8 @@ trafilatura==2.0.0
tqdm tqdm
# Gradio # Gradio
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio-4.37.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio-4.37.2+custom.19-py3-none-any.whl
https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.14/gradio_client-1.0.2+custom.14-py3-none-any.whl https://github.com/oobabooga/gradio/releases/download/4.37.2-custom.19/gradio_client-1.0.2+custom.19-py3-none-any.whl
# API # API
flask_cloudflared==0.0.15 flask_cloudflared==0.0.15
@ -23,5 +24,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Vulkan wheels # Vulkan wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.102.0/llama_cpp_binaries-0.102.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.110.0/llama_cpp_binaries-0.110.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"