From 2f979ce2942efc82ad90dfc28c7407c473da5169 Mon Sep 17 00:00:00 2001 From: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue, 12 Aug 2025 13:33:49 -0700 Subject: [PATCH] docs: Add a multimodal tutorial --- docs/Multimodal Tutorial.md | 66 +++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 docs/Multimodal Tutorial.md diff --git a/docs/Multimodal Tutorial.md b/docs/Multimodal Tutorial.md new file mode 100644 index 00000000..a30889f7 --- /dev/null +++ b/docs/Multimodal Tutorial.md @@ -0,0 +1,66 @@ +## Getting started + +### 1. Find a multimodal model + +GGUF models with vision capabilities are uploaded along a `mmproj` file to Hugging Face. + +For instance, [unsloth/gemma-3-4b-it-GGUF](https://huggingface.co/unsloth/gemma-3-4b-it-GGUF/tree/main) has this: + +print1 + +### 2. Download the model to `user_data/models` + +As an example, download + +https://huggingface.co/unsloth/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q4_K_S.gguf?download=true + +to your `text-generation-webui/user_data/models` folder. + +### 3. Download the associated mmproj file to `user_data/mmproj` + +Then download + +https://huggingface.co/unsloth/gemma-3-4b-it-GGUF/resolve/main/mmproj-F16.gguf?download=true + +to your `text-generation-webui/user_data/mmproj` folder. Name it `mmproj-gemma-3-4b-it-F16.gguf` to give it a recognizable name. + +### 4. Load the model + +1. Launch the web UI +2. Navigate to the Model tab +3. Select the GGUF model in the Model dropdown: + +print2 + +4. Select the mmproj file in the Multimodal (vision) menu: + +print3 + +5. Click "Load" + +### 5. Send a message with an image + +Select your image by clicking on the 📎 icon and send your message: + +print5 + +The model will reply with great understanding of the image contents: + +print6 + +## Multimodal with ExLlamaV3 + +Multimodal also works with the ExLlamaV3 loader (the non-HF one). + +No additional files are necessary, just load a multimodal EXL3 model and send an image. + +Examples of models that you can use: + +- https://huggingface.co/turboderp/gemma-3-27b-it-exl3 +- https://huggingface.co/turboderp/Mistral-Small-3.1-24B-Instruct-2503-exl3 + +## Multimodal API examples + +In the page below you can find some ready-to-use examples: + +[Multimodal/vision (llama.cpp and ExLlamaV3)](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#multimodalvision-llamacpp-and-exllamav3)