Compare commits

...

38 commits

Author SHA1 Message Date
dependabot[bot] 084a1d346b
Merge 0b5399612c into 5848c7884d 2025-12-05 16:36:10 -05:00
oobabooga 5848c7884d Increase the height of the image output gallery 2025-12-05 10:24:51 -08:00
oobabooga c11c14590a Image: Better LLM variation default prompt 2025-12-05 08:08:11 -08:00
oobabooga 0dd468245c Image: Add back the gallery cache (for performance) 2025-12-05 07:11:38 -08:00
oobabooga b63d57158d Image: Add TGW as a prefix to output images 2025-12-05 05:59:54 -08:00
oobabooga afa29b9554 Image: Several fixes 2025-12-05 05:58:57 -08:00
oobabooga 8eac99599a Image: Better LLM variation default prompt 2025-12-04 19:58:06 -08:00
oobabooga b4f06a50b0 fix: Pass bos_token and eos_token from metadata to jinja2
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga 15c6e43597 Image: Add a revised_prompt field to API results for OpenAI compatibility 2025-12-04 17:41:09 -08:00
oobabooga 56f2a9512f Revert "Image: Add the LLM-generated prompt to the API result"
This reverts commit c7ad28a4cd.
2025-12-04 17:34:27 -08:00
oobabooga 3ef428efaa Image: Remove llm_variations from the API 2025-12-04 17:34:17 -08:00
oobabooga c7ad28a4cd Image: Add the LLM-generated prompt to the API result 2025-12-04 17:22:08 -08:00
oobabooga b451bac082 Image: Improve a log message 2025-12-04 16:33:46 -08:00
oobabooga 47a0fcd614 Image: PNG metadata improvements 2025-12-04 16:25:48 -08:00
oobabooga ac31a7c008 Image: Organize the UI 2025-12-04 15:45:04 -08:00
oobabooga a90739f498 Image: Better LLM variation default prompt 2025-12-04 10:50:40 -08:00
oobabooga ffef3c7b1d Image: Make the LLM Variations prompt configurable 2025-12-04 10:44:35 -08:00
oobabooga 5763947c37 Image: Simplify the API code, add the llm_variations option 2025-12-04 10:23:00 -08:00
oobabooga 2793153717 Image: Add LLM-generated prompt variations 2025-12-04 08:10:24 -08:00
oobabooga 7fb9f19bd8 Progress bar style improvements 2025-12-04 06:20:45 -08:00
oobabooga a838223d18 Image: Add a progress bar during generation 2025-12-04 05:49:57 -08:00
oobabooga 14dbc3488e Image: Clear the torch cache after generation, not before 2025-12-04 05:32:58 -08:00
oobabooga 235b94f097 Image: Add placeholder file for user_data/image_models 2025-12-03 18:43:30 -08:00
oobabooga c357eed4c7 Image: Remove the flash_attention_3 option (no idea how to get it working) 2025-12-03 18:40:34 -08:00
oobabooga c93d27add3 Update llama.cpp 2025-12-03 18:29:43 -08:00
oobabooga fbca54957e Image generation: Yield partial results for batch count > 1 2025-12-03 16:13:07 -08:00
oobabooga 49c60882bf Image generation: Safer image uploading 2025-12-03 16:07:51 -08:00
oobabooga 59285d501d Image generation: Small UI improvements 2025-12-03 16:03:31 -08:00
oobabooga 373baa5c9c UI: Minor image gallery improvements 2025-12-03 14:45:02 -08:00
oobabooga 906dc54969 Load --image-model before --model 2025-12-03 12:15:38 -08:00
oobabooga 4468c49439 Add semaphore to image generation API endpoint 2025-12-03 12:02:47 -08:00
oobabooga 5ad174fad2 docs: Add an image generation API example 2025-12-03 11:58:54 -08:00
oobabooga 5433ef3333 Add an API endpoint for generating images 2025-12-03 11:50:56 -08:00
oobabooga 9448bf1caa Image generation: add torchao quantization (supports torch.compile) 2025-12-02 14:22:51 -08:00
oobabooga 97281ff831 UI: Fix an index error in the new image gallery 2025-12-02 11:20:52 -08:00
oobabooga 9d07d3a229 Make portable builds functional again after b3666e140d 2025-12-02 10:06:57 -08:00
oobabooga 6291e72129 Remove quanto for now (requires messy compilation) 2025-12-02 09:57:18 -08:00
dependabot[bot] 0b5399612c
Update gradio requirement in /requirements/portable
Updates the requirements on [gradio](https://github.com/gradio-app/gradio) to permit the latest version.
- [Release notes](https://github.com/gradio-app/gradio/releases)
- [Changelog](https://github.com/gradio-app/gradio/blob/main/CHANGELOG.md)
- [Commits](https://github.com/gradio-app/gradio/compare/gradio@4.37.1...gradio@6.0.0)

---
updated-dependencies:
- dependency-name: gradio
  dependency-version: 6.0.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-24 20:39:28 +00:00
36 changed files with 577 additions and 301 deletions

View file

@ -28,8 +28,7 @@ A Gradio web UI for Large Language Models.
- 100% offline and private, with zero telemetry, external resources, or remote update requests.
- **File attachments**: Upload text files, PDF documents, and .docx documents to talk about their contents.
- **Vision (multimodal models)**: Attach images to messages for visual understanding ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Multimodal-Tutorial)).
Image generation: A dedicated tab for diffusers models like Z-Image-Turbo and Qwen-Image. Features 4-bit/8-bit quantization and a persistent gallery with metadata (tutorial).
- **Image generation**: A dedicated tab for `diffusers` models like **Z-Image-Turbo** and **Qwen-Image**. Features 4-bit/8-bit quantization and a persistent gallery with metadata ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Image-Generation-Tutorial)).
- **Image generation**: A dedicated tab for `diffusers` models like **Z-Image-Turbo**. Features 4-bit/8-bit quantization and a persistent gallery with metadata ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Image-Generation-Tutorial)).
- **Web search**: Optionally search the internet with LLM-generated queries to add context to the conversation.
- Aesthetic UI with dark and light themes.
- Syntax highlighting for code blocks and LaTeX rendering for mathematical expressions.

View file

@ -1692,8 +1692,8 @@ button#swap-height-width {
}
#image-output-gallery, #image-output-gallery > :nth-child(2) {
height: calc(100vh - 83px);
max-height: calc(100vh - 83px);
height: calc(100vh - 66px);
max-height: calc(100vh - 66px);
}
#image-history-gallery, #image-history-gallery > :nth-child(2) {
@ -1752,3 +1752,48 @@ button#swap-height-width {
.min.svelte-1yrv54 {
min-height: 0;
}
/* Image Generation Progress Bar */
#image-progress .image-ai-separator {
height: 24px;
margin: 20px 0;
border-top: 1px solid var(--input-border-color);
}
#image-progress .image-ai-progress-wrapper {
height: 24px;
margin: 20px 0;
}
#image-progress .image-ai-progress-track {
background: #e5e7eb;
border-radius: 4px;
overflow: hidden;
height: 8px;
}
.dark #image-progress .image-ai-progress-track {
background: #333;
}
#image-progress .image-ai-progress-fill {
background: #4a9eff;
height: 100%;
}
#image-progress .image-ai-progress-text {
text-align: center;
font-size: 12px;
color: #666;
margin-top: 4px;
}
.dark #image-progress .image-ai-progress-text {
color: #888;
}
#llm-prompt-variations {
position: absolute;
top: 0;
left: calc(100% - 174px);
}

View file

@ -139,6 +139,35 @@ curl http://127.0.0.1:5000/v1/completions \
For base64-encoded images, just replace the inner "url" values with this format: `data:image/FORMAT;base64,BASE64_STRING` where FORMAT is the file type (png, jpeg, gif, etc.) and BASE64_STRING is your base64-encoded image data.
#### Image generation
```shell
curl http://127.0.0.1:5000/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "an orange tree",
"steps": 9,
"cfg_scale": 0,
"batch_size": 1,
"batch_count": 1
}'
```
You need to load an image model first. You can do this via the UI, or by adding `--image-model your_model_name` when launching the server.
The output is a JSON object containing a `data` array. Each element has a `b64_json` field with the base64-encoded PNG image:
```json
{
"created": 1764791227,
"data": [
{
"b64_json": "iVBORw0KGgo..."
}
]
}
```
#### SSE streaming
```shell
@ -419,7 +448,6 @@ The following environment variables can be used (they take precedence over every
| `OPENEDAI_CERT_PATH` | SSL certificate file path | cert.pem |
| `OPENEDAI_KEY_PATH` | SSL key file path | key.pem |
| `OPENEDAI_DEBUG` | Enable debugging (set to 1) | 1 |
| `SD_WEBUI_URL` | WebUI URL (used by endpoint) | http://127.0.0.1:7861 |
| `OPENEDAI_EMBEDDING_MODEL` | Embedding model (if applicable) | sentence-transformers/all-mpnet-base-v2 |
| `OPENEDAI_EMBEDDING_DEVICE` | Embedding device (if applicable) | cuda |
@ -430,7 +458,6 @@ You can also set the following variables in your `settings.yaml` file:
```
openai-embedding_device: cuda
openai-embedding_model: "sentence-transformers/all-mpnet-base-v2"
openai-sd_webui_url: http://127.0.0.1:7861
openai-debug: 1
```

View file

@ -1,70 +1,69 @@
import os
"""
OpenAI-compatible image generation using local diffusion models.
"""
import base64
import io
import time
import requests
from extensions.openai.errors import ServiceUnavailableError
from modules import shared
def generations(prompt: str, size: str, response_format: str, n: int):
# Stable Diffusion callout wrapper for txt2img
# Low effort implementation for compatibility. With only "prompt" being passed and assuming DALL-E
# the results will be limited and likely poor. SD has hundreds of models and dozens of settings.
# If you want high quality tailored results you should just use the Stable Diffusion API directly.
# it's too general an API to try and shape the result with specific tags like negative prompts
# or "masterpiece", etc. SD configuration is beyond the scope of this API.
# At this point I will not add the edits and variations endpoints (ie. img2img) because they
# require changing the form data handling to accept multipart form data, also to properly support
# url return types will require file management and a web serving files... Perhaps later!
base_model_size = 512 if 'SD_BASE_MODEL_SIZE' not in os.environ else int(os.environ.get('SD_BASE_MODEL_SIZE', 512))
sd_defaults = {
'sampler_name': 'DPM++ 2M Karras', # vast improvement
'steps': 30,
}
def generations(request):
"""
Generate images using the loaded diffusion model.
Returns dict with 'created' timestamp and 'data' list of images.
"""
from modules.ui_image_generation import generate
width, height = [int(x) for x in size.split('x')] # ignore the restrictions on size
if shared.image_model is None:
raise ServiceUnavailableError("No image model loaded. Load a model via the UI first.")
# to hack on better generation, edit default payload.
payload = {
'prompt': prompt, # ignore prompt limit of 1000 characters
'width': width,
'height': height,
'batch_size': n,
}
payload.update(sd_defaults)
width, height = request.get_width_height()
scale = min(width, height) / base_model_size
if scale >= 1.2:
# for better performance with the default size (1024), and larger res.
scaler = {
'width': width // scale,
'height': height // scale,
'hr_scale': scale,
'enable_hr': True,
'hr_upscaler': 'Latent',
'denoising_strength': 0.68,
}
payload.update(scaler)
# Build state dict: GenerationOptions fields + image-specific keys
state = request.model_dump()
state.update({
'image_model_menu': shared.image_model_name,
'image_prompt': request.prompt,
'image_neg_prompt': request.negative_prompt,
'image_width': width,
'image_height': height,
'image_steps': request.steps,
'image_seed': request.image_seed,
'image_batch_size': request.batch_size,
'image_batch_count': request.batch_count,
'image_cfg_scale': request.cfg_scale,
'image_llm_variations': False,
})
resp = {
'created': int(time.time()),
'data': []
}
from extensions.openai.script import params
# Exhaust generator, keep final result
images = []
for images, _ in generate(state, save_images=False):
pass
# TODO: support SD_WEBUI_AUTH username:password pair.
sd_url = f"{os.environ.get('SD_WEBUI_URL', params.get('sd_webui_url', ''))}/sdapi/v1/txt2img"
if not images:
raise ServiceUnavailableError("Image generation failed or produced no images.")
response = requests.post(url=sd_url, json=payload)
r = response.json()
if response.status_code != 200 or 'images' not in r:
print(r)
raise ServiceUnavailableError(r.get('error', 'Unknown error calling Stable Diffusion'), code=response.status_code, internal_message=r.get('errors', None))
# r['parameters']...
for b64_json in r['images']:
if response_format == 'b64_json':
resp['data'].extend([{'b64_json': b64_json}])
# Build response
resp = {'created': int(time.time()), 'data': []}
for img in images:
b64 = _image_to_base64(img)
image_obj = {'revised_prompt': request.prompt}
if request.response_format == 'b64_json':
image_obj['b64_json'] = b64
else:
resp['data'].extend([{'url': f'data:image/png;base64,{b64_json}'}]) # yeah it's lazy. requests.get() will not work with this
image_obj['url'] = f'data:image/png;base64,{b64}'
resp['data'].append(image_obj)
return resp
def _image_to_base64(image) -> str:
buffered = io.BytesIO()
image.save(buffered, format="PNG")
return base64.b64encode(buffered.getvalue()).decode('utf-8')

View file

@ -17,10 +17,8 @@ from sse_starlette import EventSourceResponse
from starlette.concurrency import iterate_in_threadpool
import extensions.openai.completions as OAIcompletions
import extensions.openai.images as OAIimages
import extensions.openai.logits as OAIlogits
import extensions.openai.models as OAImodels
from extensions.openai.errors import ServiceUnavailableError
from extensions.openai.tokens import token_count, token_decode, token_encode
from extensions.openai.utils import _start_cloudflared
from modules import shared
@ -40,6 +38,8 @@ from .typing import (
EmbeddingsResponse,
EncodeRequest,
EncodeResponse,
ImageGenerationRequest,
ImageGenerationResponse,
LoadLorasRequest,
LoadModelRequest,
LogitsRequest,
@ -54,12 +54,12 @@ from .typing import (
params = {
'embedding_device': 'cpu',
'embedding_model': 'sentence-transformers/all-mpnet-base-v2',
'sd_webui_url': '',
'debug': 0
}
streaming_semaphore = asyncio.Semaphore(1)
image_generation_semaphore = asyncio.Semaphore(1)
def verify_api_key(authorization: str = Header(None)) -> None:
@ -228,20 +228,13 @@ async def handle_audio_transcription(request: Request):
return JSONResponse(content=transcription)
@app.post('/v1/images/generations', dependencies=check_key)
async def handle_image_generation(request: Request):
@app.post('/v1/images/generations', response_model=ImageGenerationResponse, dependencies=check_key)
async def handle_image_generation(request_data: ImageGenerationRequest):
import extensions.openai.images as OAIimages
if not os.environ.get('SD_WEBUI_URL', params.get('sd_webui_url', '')):
raise ServiceUnavailableError("Stable Diffusion not available. SD_WEBUI_URL not set.")
body = await request.json()
prompt = body['prompt']
size = body.get('size', '1024x1024')
response_format = body.get('response_format', 'url') # or b64_json
n = body.get('n', 1) # ignore the batch limits of max 10
response = await OAIimages.generations(prompt=prompt, size=size, response_format=response_format, n=n)
return JSONResponse(response)
async with image_generation_semaphore:
response = await asyncio.to_thread(OAIimages.generations, request_data)
return JSONResponse(response)
@app.post("/v1/embeddings", response_model=EmbeddingsResponse, dependencies=check_key)

View file

@ -264,6 +264,42 @@ class LoadLorasRequest(BaseModel):
lora_names: List[str]
class ImageGenerationRequest(BaseModel):
"""Image-specific parameters for generation."""
prompt: str
negative_prompt: str = ""
size: str = Field(default="1024x1024", description="'WIDTHxHEIGHT'")
steps: int = Field(default=9, ge=1)
cfg_scale: float = Field(default=0.0, ge=0.0)
image_seed: int = Field(default=-1, description="-1 for random")
batch_size: int | None = Field(default=None, ge=1, description="Parallel batch size (VRAM heavy)")
n: int = Field(default=1, ge=1, description="Alias for batch_size (OpenAI compatibility)")
batch_count: int = Field(default=1, ge=1, description="Sequential batch count")
# OpenAI compatibility (unused)
model: str | None = None
response_format: str = "b64_json"
user: str | None = None
@model_validator(mode='after')
def resolve_batch_size(self):
if self.batch_size is None:
self.batch_size = self.n
return self
def get_width_height(self) -> tuple[int, int]:
try:
parts = self.size.lower().split('x')
return int(parts[0]), int(parts[1])
except (ValueError, IndexError):
return 1024, 1024
class ImageGenerationResponse(BaseModel):
created: int = int(time.time())
data: List[dict]
def to_json(obj):
return json.dumps(obj.__dict__, indent=4)

View file

@ -36,3 +36,17 @@ function switch_to_character() {
document.getElementById("character-tab-button").click();
scrollToTop();
}
function switch_to_image_ai_generate() {
const container = document.querySelector("#image-ai-tab");
const buttons = container.getElementsByTagName("button");
for (let i = 0; i < buttons.length; i++) {
if (buttons[i].textContent.trim() === "Generate") {
buttons[i].click();
break;
}
}
scrollToTop();
}

View file

@ -3,7 +3,6 @@ import copy
import functools
import html
import json
import os
import pprint
import re
import shutil
@ -26,6 +25,7 @@ from modules.html_generator import (
convert_to_markdown,
make_thumbnail
)
from modules.image_utils import open_image_safely
from modules.logging_colors import logger
from modules.text_generation import (
generate_reply,
@ -112,7 +112,9 @@ def generate_chat_prompt(user_input, state, **kwargs):
add_generation_prompt=False,
enable_thinking=state['enable_thinking'],
reasoning_effort=state['reasoning_effort'],
thinking_budget=-1 if state.get('enable_thinking', True) else 0
thinking_budget=-1 if state.get('enable_thinking', True) else 0,
bos_token=shared.bos_token,
eos_token=shared.eos_token,
)
chat_renderer = partial(
@ -475,7 +477,7 @@ def get_stopping_strings(state):
if state['mode'] in ['instruct', 'chat-instruct']:
template = jinja_env.from_string(state['instruction_template_str'])
renderer = partial(template.render, add_generation_prompt=False)
renderer = partial(template.render, add_generation_prompt=False, bos_token=shared.bos_token, eos_token=shared.eos_token)
renderers.append(renderer)
if state['mode'] in ['chat']:
@ -1516,20 +1518,6 @@ def load_instruction_template_memoized(template):
return load_instruction_template(template)
def open_image_safely(path):
if path is None or not isinstance(path, str) or not Path(path).exists():
return None
if os.path.islink(path):
return None
try:
return Image.open(path)
except Exception as e:
logger.error(f"Failed to open image file: {path}. Reason: {e}")
return None
def upload_character(file, img_path, tavern=False):
img = open_image_safely(img_path)
decoded_file = file if isinstance(file, str) else file.decode('utf-8')

View file

@ -2,7 +2,6 @@ import time
import modules.shared as shared
from modules.logging_colors import logger
from modules.torch_utils import get_device
from modules.utils import resolve_model_path
@ -11,13 +10,14 @@ def get_quantization_config(quant_method):
Get the appropriate quantization config based on the selected method.
Args:
quant_method: One of 'none', 'bnb-8bit', 'bnb-4bit', 'quanto-8bit', 'quanto-4bit', 'quanto-2bit'
quant_method: One of 'none', 'bnb-8bit', 'bnb-4bit',
'torchao-int8wo', 'torchao-fp4', 'torchao-float8wo'
Returns:
PipelineQuantizationConfig or None
"""
import torch
from diffusers import BitsAndBytesConfig, QuantoConfig
from diffusers import BitsAndBytesConfig, TorchAoConfig
from diffusers.quantizers import PipelineQuantizationConfig
if quant_method == 'none' or not quant_method:
@ -46,27 +46,27 @@ def get_quantization_config(quant_method):
}
)
# Quanto 8-bit quantization
elif quant_method == 'quanto-8bit':
# torchao int8 weight-only
elif quant_method == 'torchao-int8wo':
return PipelineQuantizationConfig(
quant_mapping={
"transformer": QuantoConfig(weights_dtype="int8")
"transformer": TorchAoConfig("int8wo")
}
)
# Quanto 4-bit quantization
elif quant_method == 'quanto-4bit':
# torchao fp4 (e2m1)
elif quant_method == 'torchao-fp4':
return PipelineQuantizationConfig(
quant_mapping={
"transformer": QuantoConfig(weights_dtype="int4")
"transformer": TorchAoConfig("fp4_e2m1")
}
)
# Quanto 2-bit quantization
elif quant_method == 'quanto-2bit':
# torchao float8 weight-only
elif quant_method == 'torchao-float8wo':
return PipelineQuantizationConfig(
quant_mapping={
"transformer": QuantoConfig(weights_dtype="int2")
"transformer": TorchAoConfig("float8wo")
}
)
@ -98,14 +98,16 @@ def load_image_model(model_name, dtype='bfloat16', attn_backend='sdpa', cpu_offl
Args:
model_name: Name of the model directory
dtype: 'bfloat16' or 'float16'
attn_backend: 'sdpa', 'flash_attention_2', or 'flash_attention_3'
attn_backend: 'sdpa' or 'flash_attention_2'
cpu_offload: Enable CPU offloading for low VRAM
compile_model: Compile the model for faster inference (slow first run)
quant_method: Quantization method - 'none', 'bnb-8bit', 'bnb-4bit', 'quanto-8bit', 'quanto-4bit', 'quanto-2bit'
quant_method: 'none', 'bnb-8bit', 'bnb-4bit', or torchao options (int8wo, fp4, float8wo)
"""
import torch
from diffusers import DiffusionPipeline
from modules.torch_utils import get_device
logger.info(f"Loading image model \"{model_name}\" with quantization: {quant_method}")
t0 = time.time()
@ -139,18 +141,24 @@ def load_image_model(model_name, dtype='bfloat16', attn_backend='sdpa', cpu_offl
if not cpu_offload:
pipe.to(get_device())
# Set attention backend (if supported by the pipeline)
if hasattr(pipe, 'transformer') and hasattr(pipe.transformer, 'set_attention_backend'):
if attn_backend == 'flash_attention_2':
pipe.transformer.set_attention_backend("flash")
elif attn_backend == 'flash_attention_3':
pipe.transformer.set_attention_backend("_flash_3")
# sdpa is the default, no action needed
modules = ["transformer", "unet"]
# Set attention backend
if attn_backend == 'flash_attention_2':
for name in modules:
mod = getattr(pipe, name, None)
if hasattr(mod, "set_attention_backend"):
mod.set_attention_backend("flash")
break
# Compile model
if compile_model:
if hasattr(pipe, 'transformer') and hasattr(pipe.transformer, 'compile'):
logger.info("Compiling model (first run will be slow)...")
pipe.transformer.compile()
for name in modules:
mod = getattr(pipe, name, None)
if hasattr(mod, "compile"):
logger.info("Compiling model (first run will be slow)...")
mod.compile()
break
if cpu_offload:
pipe.enable_model_cpu_offload()

View file

@ -1,9 +1,7 @@
"""
Shared image processing utilities for multimodal support.
Used by both ExLlamaV3 and llama.cpp implementations.
"""
import base64
import io
import os
from pathlib import Path
from typing import Any, List, Tuple
from PIL import Image
@ -11,6 +9,20 @@ from PIL import Image
from modules.logging_colors import logger
def open_image_safely(path):
if path is None or not isinstance(path, str) or not Path(path).exists():
return None
if os.path.islink(path):
return None
try:
return Image.open(path)
except Exception as e:
logger.error(f"Failed to open image file: {path}. Reason: {e}")
return None
def convert_pil_to_base64(image: Image.Image) -> str:
"""Converts a PIL Image to a base64 encoded string."""
buffered = io.BytesIO()

View file

@ -89,8 +89,9 @@ def get_model_metadata(model):
else:
bos_token = ""
template = template.replace('eos_token', "'{}'".format(eos_token))
template = template.replace('bos_token', "'{}'".format(bos_token))
shared.bos_token = bos_token
shared.eos_token = eos_token
template = re.sub(r"\{\{-?\s*raise_exception\(.*?\)\s*-?\}\}", "", template, flags=re.DOTALL)
template = re.sub(r'raise_exception\([^)]*\)', "''", template)
@ -160,13 +161,16 @@ def get_model_metadata(model):
# 4. If a template was found from any source, process it
if template:
shared.bos_token = '<s>'
shared.eos_token = '</s>'
for k in ['eos_token', 'bos_token']:
if k in metadata:
value = metadata[k]
if isinstance(value, dict):
value = value['content']
template = template.replace(k, "'{}'".format(value))
setattr(shared, k, value)
template = re.sub(r"\{\{-?\s*raise_exception\(.*?\)\s*-?\}\}", "", template, flags=re.DOTALL)
template = re.sub(r'raise_exception\([^)]*\)', "''", template)

View file

@ -19,6 +19,8 @@ is_seq2seq = False
is_multimodal = False
model_dirty_from_training = False
lora_names = []
bos_token = '<s>'
eos_token = '</s>'
# Image model variables
image_model = None
@ -56,11 +58,11 @@ group = parser.add_argument_group('Image model')
group.add_argument('--image-model', type=str, help='Name of the image model to select on startup (overrides saved setting).')
group.add_argument('--image-model-dir', type=str, default='user_data/image_models', help='Path to directory with all the image models.')
group.add_argument('--image-dtype', type=str, default=None, choices=['bfloat16', 'float16'], help='Data type for image model.')
group.add_argument('--image-attn-backend', type=str, default=None, choices=['sdpa', 'flash_attention_2', 'flash_attention_3'], help='Attention backend for image model.')
group.add_argument('--image-attn-backend', type=str, default=None, choices=['sdpa', 'flash_attention_2'], help='Attention backend for image model.')
group.add_argument('--image-cpu-offload', action='store_true', help='Enable CPU offloading for image model.')
group.add_argument('--image-compile', action='store_true', help='Compile the image model for faster inference.')
group.add_argument('--image-quant', type=str, default=None,
choices=['none', 'bnb-8bit', 'bnb-4bit', 'quanto-8bit', 'quanto-4bit', 'quanto-2bit'],
choices=['none', 'bnb-8bit', 'bnb-4bit', 'torchao-int8wo', 'torchao-fp4', 'torchao-float8wo'],
help='Quantization method for image model.')
# Model loader
@ -319,6 +321,8 @@ settings = {
'image_seed': -1,
'image_batch_size': 1,
'image_batch_count': 1,
'image_llm_variations': False,
'image_llm_variations_prompt': 'Write a variation of the image generation prompt above. Consider the intent of the user with that prompt and write something that will likely please them, with added details. Output only the new prompt. Do not add any explanations, prefixes, or additional text.',
'image_model_menu': 'None',
'image_dtype': 'bfloat16',
'image_attn_backend': 'sdpa',

View file

@ -280,25 +280,28 @@ def list_interface_input_elements():
'include_past_attachments',
]
# Image generation elements
elements += [
'image_prompt',
'image_neg_prompt',
'image_width',
'image_height',
'image_aspect_ratio',
'image_steps',
'image_cfg_scale',
'image_seed',
'image_batch_size',
'image_batch_count',
'image_model_menu',
'image_dtype',
'image_attn_backend',
'image_compile',
'image_cpu_offload',
'image_quant',
]
if not shared.args.portable:
# Image generation elements
elements += [
'image_prompt',
'image_neg_prompt',
'image_width',
'image_height',
'image_aspect_ratio',
'image_steps',
'image_cfg_scale',
'image_seed',
'image_batch_size',
'image_batch_count',
'image_llm_variations',
'image_llm_variations_prompt',
'image_model_menu',
'image_dtype',
'image_attn_backend',
'image_compile',
'image_cpu_offload',
'image_quant',
]
return elements
@ -531,25 +534,31 @@ def setup_auto_save():
'paste_to_attachment',
'include_past_attachments',
# Image generation tab (ui_image_generation.py)
'image_prompt',
'image_neg_prompt',
'image_width',
'image_height',
'image_aspect_ratio',
'image_steps',
'image_cfg_scale',
'image_seed',
'image_batch_size',
'image_batch_count',
'image_model_menu',
'image_dtype',
'image_attn_backend',
'image_compile',
'image_cpu_offload',
'image_quant',
]
if not shared.args.portable:
# Image generation tab (ui_image_generation.py)
change_elements += [
'image_prompt',
'image_neg_prompt',
'image_width',
'image_height',
'image_aspect_ratio',
'image_steps',
'image_cfg_scale',
'image_seed',
'image_batch_size',
'image_batch_count',
'image_llm_variations',
'image_llm_variations_prompt',
'image_model_menu',
'image_dtype',
'image_attn_backend',
'image_compile',
'image_cpu_offload',
'image_quant',
]
for element_name in change_elements:
if element_name in shared.gradio:
shared.gradio[element_name].change(

View file

@ -7,7 +7,6 @@ from pathlib import Path
import gradio as gr
import numpy as np
from PIL import Image
from PIL.PngImagePlugin import PngInfo
from modules import shared, ui, utils
@ -16,10 +15,10 @@ from modules.image_models import (
load_image_model,
unload_image_model
)
from modules.image_utils import open_image_safely
from modules.logging_colors import logger
from modules.text_generation import stop_everything_event
from modules.torch_utils import get_device
from modules.utils import gradio
from modules.utils import check_model_loaded, gradio
ASPECT_RATIOS = {
"1:1 Square": (1, 1),
@ -30,7 +29,7 @@ ASPECT_RATIOS = {
}
STEP = 16
IMAGES_PER_PAGE = 64
IMAGES_PER_PAGE = 32
# Settings keys to save in PNG metadata (Generate tab only)
METADATA_SETTINGS_KEYS = [
@ -41,8 +40,6 @@ METADATA_SETTINGS_KEYS = [
'image_aspect_ratio',
'image_steps',
'image_seed',
'image_batch_size',
'image_batch_count',
'image_cfg_scale',
]
@ -137,6 +134,9 @@ def build_generation_metadata(state, actual_seed):
def save_generated_images(images, state, actual_seed):
"""Save images with generation metadata embedded in PNG."""
if shared.args.multi_user:
return
date_str = datetime.now().strftime("%Y-%m-%d")
folder_path = os.path.join("user_data", "image_outputs", date_str)
os.makedirs(folder_path, exist_ok=True)
@ -146,7 +146,7 @@ def save_generated_images(images, state, actual_seed):
for idx, img in enumerate(images):
timestamp = datetime.now().strftime("%H-%M-%S")
filename = f"{timestamp}_{actual_seed:010d}_{idx:03d}.png"
filename = f"TGW_{timestamp}_{actual_seed:010d}_{idx:03d}.png"
filepath = os.path.join(folder_path, filename)
# Create PNG metadata
@ -160,9 +160,14 @@ def save_generated_images(images, state, actual_seed):
def read_image_metadata(image_path):
"""Read generation metadata from PNG file."""
try:
with Image.open(image_path) as img:
img = open_image_safely(image_path)
if img is None:
return None
try:
if hasattr(img, 'text') and 'image_gen_settings' in img.text:
return json.loads(img.text['image_gen_settings'])
finally:
img.close()
except Exception as e:
logger.debug(f"Could not read metadata from {image_path}: {e}")
return None
@ -173,7 +178,7 @@ def format_metadata_for_display(metadata):
if not metadata:
return "No generation settings found in this image."
lines = ["**Generation Settings**", ""]
lines = []
# Display in a nice order
display_order = [
@ -185,8 +190,6 @@ def format_metadata_for_display(metadata):
('image_steps', 'Steps'),
('image_cfg_scale', 'CFG Scale'),
('image_seed', 'Seed'),
('image_batch_size', 'Batch Size'),
('image_batch_count', 'Batch Count'),
('model', 'Model'),
('generated_at', 'Generated At'),
]
@ -291,8 +294,10 @@ def on_gallery_select(evt: gr.SelectData, current_page):
if evt.index is None:
return "", "Select an image to view its settings"
# Get the current page's images to find the actual file path
all_images = get_all_history_images()
if not _image_cache:
get_all_history_images()
all_images = _image_cache
total_images = len(all_images)
# Calculate the actual index in the full list
@ -312,11 +317,11 @@ def on_gallery_select(evt: gr.SelectData, current_page):
def send_to_generate(selected_image_path):
"""Load settings from selected image and return updates for all Generate tab inputs."""
if not selected_image_path or not os.path.exists(selected_image_path):
return [gr.update()] * 10 + ["No image selected"]
return [gr.update()] * 8 + ["No image selected"]
metadata = read_image_metadata(selected_image_path)
if not metadata:
return [gr.update()] * 10 + ["No settings found in this image"]
return [gr.update()] * 8 + ["No settings found in this image"]
# Return updates for each input element in order
updates = [
@ -327,8 +332,6 @@ def send_to_generate(selected_image_path):
gr.update(value=metadata.get('image_aspect_ratio', '1:1 Square')),
gr.update(value=metadata.get('image_steps', 9)),
gr.update(value=metadata.get('image_seed', -1)),
gr.update(value=metadata.get('image_batch_size', 1)),
gr.update(value=metadata.get('image_batch_count', 1)),
gr.update(value=metadata.get('image_cfg_scale', 0.0)),
]
@ -368,10 +371,26 @@ def create_ui():
lines=3,
value=shared.settings['image_neg_prompt']
)
shared.gradio['image_llm_variations'] = gr.Checkbox(
value=shared.settings['image_llm_variations'],
label='LLM Prompt Variations',
elem_id="llm-prompt-variations",
)
shared.gradio['image_llm_variations_prompt'] = gr.Textbox(
value=shared.settings['image_llm_variations_prompt'],
label='Variation Prompt',
lines=3,
placeholder='Instructions for generating prompt variations...',
visible=shared.settings['image_llm_variations'],
info='Use the loaded LLM to generate creative prompt variations for each sequential batch.'
)
shared.gradio['image_generate_btn'] = gr.Button("Generate", variant="primary", size="lg")
shared.gradio['image_stop_btn'] = gr.Button("Stop", size="lg", visible=False)
gr.HTML("<hr style='border-top: 1px solid #444; margin: 20px 0;'>")
shared.gradio['image_progress'] = gr.HTML(
value=progress_bar_html(),
elem_id="image-progress"
)
gr.Markdown("### Dimensions")
with gr.Row():
@ -401,6 +420,7 @@ def create_ui():
info="Z-Image Turbo: 0.0 | Qwen: 4.0"
)
shared.gradio['image_seed'] = gr.Number(label="Seed", value=shared.settings['image_seed'], precision=0, info="-1 = Random")
with gr.Column():
shared.gradio['image_batch_size'] = gr.Slider(1, 32, value=shared.settings['image_batch_size'], step=1, label="Batch Size (VRAM Heavy)", info="Generates N images at once.")
shared.gradio['image_batch_count'] = gr.Slider(1, 128, value=shared.settings['image_batch_count'], step=1, label="Sequential Count (Loop)", info="Repeats the generation N times.")
@ -416,9 +436,9 @@ def create_ui():
# Pagination controls
with gr.Row():
shared.gradio['image_refresh_history'] = gr.Button("🔄 Refresh", elem_classes="refresh-button")
shared.gradio['image_prev_page'] = gr.Button("◀ Prev", elem_classes="refresh-button")
shared.gradio['image_prev_page'] = gr.Button("◀ Prev Page", elem_classes="refresh-button")
shared.gradio['image_page_info'] = gr.Markdown(value=get_initial_page_info, elem_id="image-page-info")
shared.gradio['image_next_page'] = gr.Button("Next ", elem_classes="refresh-button")
shared.gradio['image_next_page'] = gr.Button("Next Page ", elem_classes="refresh-button")
shared.gradio['image_page_input'] = gr.Number(value=1, label="Page", precision=0, minimum=1, scale=0, min_width=80)
shared.gradio['image_go_to_page'] = gr.Button("Go", elem_classes="refresh-button", scale=0, min_width=50)
@ -439,7 +459,7 @@ def create_ui():
)
with gr.Column(scale=1):
gr.Markdown("### Selected Image")
gr.Markdown("### Generation Settings")
shared.gradio['image_settings_display'] = gr.Markdown("Select an image to view its settings")
shared.gradio['image_send_to_generate'] = gr.Button("Send to Generate", variant="primary")
shared.gradio['image_gallery_status'] = gr.Markdown("")
@ -471,9 +491,9 @@ def create_ui():
with gr.Column():
shared.gradio['image_quant'] = gr.Dropdown(
label='Quantization',
choices=['none', 'bnb-8bit', 'bnb-4bit', 'quanto-8bit', 'quanto-4bit', 'quanto-2bit'],
choices=['none', 'bnb-8bit', 'bnb-4bit', 'torchao-int8wo', 'torchao-fp4', 'torchao-float8wo'],
value=shared.settings['image_quant'],
info='Quantization method for reduced VRAM usage. Quanto supports lower precisions (2-bit, 4-bit, 8-bit).'
info='BnB: bitsandbytes quantization. torchao: int8wo, fp4, float8wo.'
)
shared.gradio['image_dtype'] = gr.Dropdown(
@ -483,7 +503,7 @@ def create_ui():
info='bfloat16 recommended for modern GPUs'
)
shared.gradio['image_attn_backend'] = gr.Dropdown(
choices=['sdpa', 'flash_attention_2', 'flash_attention_3'],
choices=['sdpa', 'flash_attention_2'],
value=shared.settings['image_attn_backend'],
label='Attention Backend',
info='SDPA is default. Flash Attention requires compatible GPU.'
@ -507,9 +527,7 @@ def create_ui():
info="Enter HuggingFace path. Use : for branch, e.g. user/model:main"
)
shared.gradio['image_download_btn'] = gr.Button("Download", variant='primary')
shared.gradio['image_model_status'] = gr.Markdown(
value=f"Model: **{shared.settings['image_model_menu']}** (not loaded)" if shared.settings['image_model_menu'] != 'None' else "No model selected"
)
shared.gradio['image_model_status'] = gr.Markdown(value="")
def create_event_handlers():
@ -546,19 +564,19 @@ def create_event_handlers():
shared.gradio['image_generate_btn'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('image_stop_btn', 'image_generate_btn')).then(
generate, gradio('interface_state'), gradio('image_output_gallery'), show_progress=False).then(
generate, gradio('interface_state'), gradio('image_output_gallery', 'image_progress'), show_progress=False).then(
lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('image_stop_btn', 'image_generate_btn'))
shared.gradio['image_prompt'].submit(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('image_stop_btn', 'image_generate_btn')).then(
generate, gradio('interface_state'), gradio('image_output_gallery'), show_progress=False).then(
generate, gradio('interface_state'), gradio('image_output_gallery', 'image_progress'), show_progress=False).then(
lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('image_stop_btn', 'image_generate_btn'))
shared.gradio['image_neg_prompt'].submit(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('image_stop_btn', 'image_generate_btn')).then(
generate, gradio('interface_state'), gradio('image_output_gallery'), show_progress=False).then(
generate, gradio('interface_state'), gradio('image_output_gallery', 'image_progress'), show_progress=False).then(
lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('image_stop_btn', 'image_generate_btn'))
# Stop button
@ -644,11 +662,10 @@ def create_event_handlers():
'image_aspect_ratio',
'image_steps',
'image_seed',
'image_batch_size',
'image_batch_count',
'image_cfg_scale',
'image_gallery_status'
),
js=f'() => {{{ui.switch_tabs_js}; switch_to_image_ai_generate()}}',
show_progress=False
)
@ -659,24 +676,101 @@ def create_event_handlers():
show_progress=False
)
# LLM Variations visibility toggle
shared.gradio['image_llm_variations'].change(
lambda x: gr.update(visible=x),
gradio('image_llm_variations'),
gradio('image_llm_variations_prompt'),
show_progress=False
)
def generate(state):
def generate_prompt_variation(state):
"""Generate a creative variation of the image prompt using the LLM."""
from modules.chat import generate_chat_prompt
from modules.text_generation import generate_reply
prompt = state['image_prompt']
# Check if LLM is loaded
model_loaded, _ = check_model_loaded()
if not model_loaded:
logger.warning("No LLM loaded for prompt variation. Using original prompt.")
return prompt
# Get the custom variation prompt or use default
variation_instruction = state.get('image_llm_variations_prompt', '')
if not variation_instruction:
variation_instruction = 'Write a variation of the image generation prompt above. Consider the intent of the user with that prompt and write something that will likely please them, with added details. Output only the new prompt. Do not add any explanations, prefixes, or additional text.'
augmented_message = f"{prompt}\n\n=====\n\n{variation_instruction}"
# Use minimal state for generation
var_state = state.copy()
var_state['history'] = {'internal': [], 'visible': [], 'metadata': {}}
var_state['auto_max_new_tokens'] = True
var_state['enable_thinking'] = False
var_state['reasoning_effort'] = 'low'
var_state['start_with'] = ""
formatted_prompt = generate_chat_prompt(augmented_message, var_state)
variation = ""
for reply in generate_reply(formatted_prompt, var_state, stopping_strings=[], is_chat=True):
variation = reply
# Strip thinking blocks if present
if "</think>" in variation:
variation = variation.rsplit("</think>", 1)[1]
elif "<|start|>assistant<|channel|>final<|message|>" in variation:
variation = variation.rsplit("<|start|>assistant<|channel|>final<|message|>", 1)[1]
elif "</seed:think>" in variation:
variation = variation.rsplit("</seed:think>", 1)[1]
variation = variation.strip()
if len(variation) >= 2 and variation.startswith('"') and variation.endswith('"'):
variation = variation[1:-1]
if variation:
logger.info("Prompt variation:")
print(variation)
return variation
return prompt
def progress_bar_html(progress=0, text=""):
"""Generate HTML for progress bar. Empty div when progress <= 0."""
if progress <= 0:
return '<div class="image-ai-separator"></div>'
return f'''<div class="image-ai-progress-wrapper">
<div class="image-ai-progress-track">
<div class="image-ai-progress-fill" style="width: {progress * 100:.1f}%;"></div>
</div>
<div class="image-ai-progress-text">{text}</div>
</div>'''
def generate(state, save_images=True):
"""
Generate images using the loaded model.
Automatically adjusts parameters based on pipeline type.
"""
import queue
import threading
import torch
from modules.torch_utils import clear_torch_cache
clear_torch_cache()
from modules.torch_utils import clear_torch_cache, get_device
try:
model_name = state['image_model_menu']
if not model_name or model_name == 'None':
logger.error("No image model selected. Go to the Model tab and select a model.")
return []
yield [], progress_bar_html()
return
if shared.image_model is None:
result = load_image_model(
@ -689,7 +783,8 @@ def generate(state):
)
if result is None:
logger.error(f"Failed to load model `{model_name}`.")
return []
yield [], progress_bar_html()
return
shared.image_model_name = model_name
@ -700,7 +795,7 @@ def generate(state):
device = get_device()
if device is None:
device = "cpu"
generator = torch.Generator(device).manual_seed(int(seed))
generator = torch.Generator(device)
all_images = []
@ -709,70 +804,113 @@ def generate(state):
if pipeline_type is None:
pipeline_type = get_pipeline_type(shared.image_model)
# Process Prompt
prompt = state['image_prompt']
# Apply "Positive Magic" for Qwen models only
if pipeline_type == 'qwenimage':
magic_suffix = ", Ultra HD, 4K, cinematic composition"
# Avoid duplication if user already added it
if magic_suffix.strip(", ") not in prompt:
prompt += magic_suffix
# Reset stop flag at start
shared.stop_everything = False
# Callback to check for interruption during diffusion steps
batch_count = int(state['image_batch_count'])
steps_per_batch = int(state['image_steps'])
total_steps = steps_per_batch * batch_count
# Queue for progress updates from callback
progress_queue = queue.Queue()
def interrupt_callback(pipe, step_index, timestep, callback_kwargs):
if shared.stop_everything:
pipe._interrupt = True
progress_queue.put(step_index + 1)
return callback_kwargs
# Build generation kwargs
gen_kwargs = {
"prompt": prompt,
"negative_prompt": state['image_neg_prompt'],
"height": int(state['image_height']),
"width": int(state['image_width']),
"num_inference_steps": int(state['image_steps']),
"num_inference_steps": steps_per_batch,
"num_images_per_prompt": int(state['image_batch_size']),
"generator": generator,
"callback_on_step_end": interrupt_callback,
}
# Add pipeline-specific parameters for CFG
cfg_val = state.get('image_cfg_scale', 0.0)
if pipeline_type == 'qwenimage':
# Qwen-Image uses true_cfg_scale (typically 4.0)
gen_kwargs["true_cfg_scale"] = cfg_val
else:
# Z-Image and others use guidance_scale (typically 0.0 for Turbo)
gen_kwargs["guidance_scale"] = cfg_val
t0 = time.time()
for i in range(int(state['image_batch_count'])):
for batch_idx in range(batch_count):
if shared.stop_everything:
break
generator.manual_seed(int(seed + i))
batch_results = shared.image_model(**gen_kwargs).images
all_images.extend(batch_results)
generator.manual_seed(int(seed + batch_idx))
# Generate prompt variation if enabled
if state['image_llm_variations']:
gen_kwargs["prompt"] = generate_prompt_variation(state)
# Run generation in thread so we can yield progress
result_holder = []
error_holder = []
def run_batch():
try:
# Apply magic suffix only at generation time for qwenimage
clean_prompt = gen_kwargs["prompt"]
if pipeline_type == 'qwenimage':
magic_suffix = ", Ultra HD, 4K, cinematic composition"
if magic_suffix.strip(", ") not in clean_prompt:
gen_kwargs["prompt"] = clean_prompt + magic_suffix
result_holder.extend(shared.image_model(**gen_kwargs).images)
gen_kwargs["prompt"] = clean_prompt # restore
except Exception as e:
error_holder.append(e)
thread = threading.Thread(target=run_batch)
thread.start()
# Yield progress updates while generation runs
while thread.is_alive():
try:
step = progress_queue.get(timeout=0.1)
absolute_step = batch_idx * steps_per_batch + step
pct = absolute_step / total_steps
text = f"Batch {batch_idx + 1}/{batch_count} — Step {step}/{steps_per_batch}"
yield all_images, progress_bar_html(pct, text)
except queue.Empty:
pass
thread.join()
if error_holder:
raise error_holder[0]
# Save this batch's images with the actual prompt and seed used
if save_images:
batch_seed = seed + batch_idx
original_prompt = state['image_prompt']
state['image_prompt'] = gen_kwargs["prompt"]
save_generated_images(result_holder, state, batch_seed)
state['image_prompt'] = original_prompt
all_images.extend(result_holder)
yield all_images, progress_bar_html((batch_idx + 1) / batch_count, f"Batch {batch_idx + 1}/{batch_count} complete")
t1 = time.time()
save_generated_images(all_images, state, seed)
total_images = int(state['image_batch_count']) * int(state['image_batch_size'])
total_steps = state["image_steps"] * int(state['image_batch_count'])
total_images = batch_count * int(state['image_batch_size'])
logger.info(f'Generated {total_images} {"image" if total_images == 1 else "images"} in {(t1 - t0):.2f} seconds ({total_steps / (t1 - t0):.2f} steps/s, seed {seed})')
return all_images
yield all_images, progress_bar_html()
clear_torch_cache()
except Exception as e:
logger.error(f"Image generation failed: {e}")
traceback.print_exc()
return []
yield [], progress_bar_html()
clear_torch_cache()
def load_image_model_wrapper(model_name, dtype, attn_backend, cpu_offload, compile_model, quant_method):

View file

@ -11,7 +11,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -26,6 +25,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm
@ -44,8 +44,8 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav3/releases/download/v0.0.16/exllamav3-0.0.16+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/turboderp-org/exllamav3/releases/download/v0.0.16/exllamav3-0.0.16+cu128.torch2.7.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.2/exllamav2-0.3.2+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

View file

@ -9,7 +9,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -24,6 +23,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm
@ -42,7 +42,7 @@ sse-starlette==1.6.5
tiktoken
# AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+rocm6.4.4-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+rocm6.4.4-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.2/exllamav2-0.3.2+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.2/exllamav2-0.3.2-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"

View file

@ -9,7 +9,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -24,6 +23,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm
@ -42,7 +42,7 @@ sse-starlette==1.6.5
tiktoken
# AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.2/exllamav2-0.3.2+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.2/exllamav2-0.3.2-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"

View file

@ -9,7 +9,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -24,6 +23,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm
@ -42,5 +42,5 @@ sse-starlette==1.6.5
tiktoken
# Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"

View file

@ -9,7 +9,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -24,6 +23,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm
@ -42,5 +42,5 @@ sse-starlette==1.6.5
tiktoken
# Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"

View file

@ -9,7 +9,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -24,6 +23,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm
@ -42,5 +42,5 @@ sse-starlette==1.6.5
tiktoken
# llama.cpp (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -9,7 +9,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -24,6 +23,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm
@ -42,5 +42,5 @@ sse-starlette==1.6.5
tiktoken
# llama.cpp (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -11,7 +11,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -26,6 +25,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm
@ -44,8 +44,8 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav3/releases/download/v0.0.16/exllamav3-0.0.16+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/turboderp-org/exllamav3/releases/download/v0.0.16/exllamav3-0.0.16+cu128.torch2.7.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.2/exllamav2-0.3.2+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

View file

@ -9,7 +9,6 @@ huggingface-hub==0.36.0
jinja2==3.1.6
markdown
numpy==2.2.*
optimum-quanto==0.2.7
pandas
peft==0.18.*
Pillow>=9.5.0
@ -24,6 +23,7 @@ safetensors==0.6.*
scipy
sentencepiece
tensorboard
torchao==0.14.*
transformers==4.57.*
triton-windows==3.5.1.post21; platform_system == "Windows"
tqdm

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+rocm6.4.4-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+rocm6.4.4-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+rocm6.4.4avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+rocm6.4.4avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# llama.cpp (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# llama.cpp (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# Vulkan wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -14,7 +14,7 @@ rich
tqdm
# Gradio
gradio==4.37.*
gradio==6.0.*
https://github.com/oobabooga/gradio/releases/download/custom-build/gradio_client-1.0.2+custom.1-py3-none-any.whl
# API
@ -23,5 +23,5 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.62.0/llama_cpp_binaries-0.62.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.64.0/llama_cpp_binaries-0.64.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -275,6 +275,22 @@ if __name__ == "__main__":
if extension not in shared.args.extensions:
shared.args.extensions.append(extension)
# Load image model if specified via CLI
if shared.args.image_model:
logger.info(f"Loading image model: {shared.args.image_model}")
result = load_image_model(
shared.args.image_model,
dtype=shared.settings.get('image_dtype', 'bfloat16'),
attn_backend=shared.settings.get('image_attn_backend', 'sdpa'),
cpu_offload=shared.settings.get('image_cpu_offload', False),
compile_model=shared.settings.get('image_compile', False),
quant_method=shared.settings.get('image_quant', 'none')
)
if result is not None:
shared.image_model_name = shared.args.image_model
else:
logger.error(f"Failed to load image model: {shared.args.image_model}")
available_models = utils.get_available_models()
# Model defined through --model
@ -321,22 +337,6 @@ if __name__ == "__main__":
if shared.args.lora:
add_lora_to_model(shared.args.lora)
# Load image model if specified via CLI
if shared.args.image_model:
logger.info(f"Loading image model: {shared.args.image_model}")
result = load_image_model(
shared.args.image_model,
dtype=shared.settings.get('image_dtype', 'bfloat16'),
attn_backend=shared.settings.get('image_attn_backend', 'sdpa'),
cpu_offload=shared.settings.get('image_cpu_offload', False),
compile_model=shared.settings.get('image_compile', False),
quant_method=shared.settings.get('image_quant', 'none')
)
if result is not None:
shared.image_model_name = shared.args.image_model
else:
logger.error(f"Failed to load image model: {shared.args.image_model}")
shared.generation_lock = Lock()
if shared.args.idle_timeout > 0: