From e20b2d38ff38fbd6451c8ff53c9e12fc9a327a14 Mon Sep 17 00:00:00 2001 From: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri, 5 Dec 2025 14:12:08 -0800 Subject: [PATCH] docs: Add VRAM measurements for Z-Image-Turbo --- docs/Image Generation Tutorial.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/Image Generation Tutorial.md b/docs/Image Generation Tutorial.md index 0c9eb848..a74a4ddd 100644 --- a/docs/Image Generation Tutorial.md +++ b/docs/Image Generation Tutorial.md @@ -34,7 +34,15 @@ Select the quantization option in the "Quantization" menu and click "Load". The memory usage for `Z-Image-Turbo` for each option is: -If you have less GPU memory than _, check the "CPU Offload" option. +| Quantization Method | VRAM Usage | +| :--- | :--- | +| **None (FP16/BF16)** | 25613 MiB | +| **bnb-8bit** | 16301 MiB | +| **bnb-8bit + CPU Offload** | 16235 MiB | +| **bnb-4bit** | 11533 MiB | +| **bnb-4bit + CPU Offload** | 7677 MiB | + +The `torchao` options support `torch.compile` for faster image generation, with `float8wo` specifically providing native hardware acceleration for RTX 40-series and newer GPUs. Note: The next time you launch the web UI, the model will get automatically loaded with your last settings when you try to generate an image. You do not need to go to the Model tab and click "Load" each time.