diff --git a/.github/ISSUE_TEMPLATE/bug_report_template.yml b/.github/ISSUE_TEMPLATE/bug_report_template.yml
index bd30a0c9..ad22b656 100644
--- a/.github/ISSUE_TEMPLATE/bug_report_template.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report_template.yml
@@ -46,7 +46,7 @@ body:
     id: system-info
     attributes:
       label: System Info
-      description: "Please share your system info with us: operating system, GPU brand, and GPU model. If you are using a Google Colab notebook, mention that instead."
+      description: "Please share your operating system and GPU type (NVIDIA/AMD/Intel/Apple). If you are using a Google Colab notebook, mention that instead."
       render: shell
       placeholder: 
     validations:
diff --git a/README.md b/README.md
index 45ab48eb..6e7c05b1 100644
--- a/README.md
+++ b/README.md
@@ -24,20 +24,24 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
 - Multiple sampling parameters and generation options for sophisticated text generation control.
 - Switch between different models in the UI without restarting.
 - Automatic GPU layers for GGUF models (on NVIDIA GPUs).
-- Free-form text generation in the Default/Notebook tabs without being limited to chat turns.
+- Free-form text generation in the Notebook tab without being limited to chat turns.
 - OpenAI-compatible API with Chat and Completions endpoints, including tool-calling support – see [examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples).
 - Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.
 
 ## How to install
 
-#### Option 1: Portable builds (start here)
+#### Option 1: Portable builds (get started in 1 minute)
 
-No installation needed – just unzip and run. Compatible with GGUF (llama.cpp) models on Windows, Linux, and macOS.
+No installation needed – just download, unzip and run. All dependencies included.
 
-Download from: https://github.com/oobabooga/text-generation-webui/releases
+Compatible with GGUF (llama.cpp) models on Windows, Linux, and macOS.
+
+Download from here: https://github.com/oobabooga/text-generation-webui/releases
 
 #### Option 2: One-click installer
 
+For users who need additional backends (ExLlamaV3, Transformers) or extensions (TTS, voice input, translation, etc). Requires ~10GB disk space and downloads PyTorch.
+
 1. Clone the repository, or [download its source code](https://github.com/oobabooga/text-generation-webui/archive/refs/heads/main.zip) and extract it.
 2. Run the startup script for your OS: `start_windows.bat`, `start_linux.sh`, or `start_macos.sh`.
 3. When prompted, select your GPU vendor.
@@ -150,21 +154,21 @@ The `requirements*.txt` above contain various wheels precompiled through GitHub
 ```
 For NVIDIA GPU:
 ln -s docker/{nvidia/Dockerfile,nvidia/docker-compose.yml,.dockerignore} .
-For AMD GPU: 
+For AMD GPU:
 ln -s docker/{amd/Dockerfile,amd/docker-compose.yml,.dockerignore} .
 For Intel GPU:
 ln -s docker/{intel/Dockerfile,amd/docker-compose.yml,.dockerignore} .
 For CPU only
 ln -s docker/{cpu/Dockerfile,cpu/docker-compose.yml,.dockerignore} .
 cp docker/.env.example .env
-#Create logs/cache dir : 
+#Create logs/cache dir :
 mkdir -p user_data/logs user_data/cache
-# Edit .env and set: 
+# Edit .env and set:
 #   TORCH_CUDA_ARCH_LIST based on your GPU model
 #   APP_RUNTIME_GID      your host user's group id (run `id -g` in a terminal)
 #   BUILD_EXTENIONS      optionally add comma separated list of extensions to build
 # Edit user_data/CMD_FLAGS.txt and add in it the options you want to execute (like --listen --cpu)
-# 
+#
 docker compose up --build
 ```
 
@@ -188,7 +192,7 @@ List of command-line flags
 </summary>
 
 ```txt
-usage: server.py [-h] [--multi-user] [--character CHARACTER] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--settings SETTINGS]
+usage: server.py [-h] [--multi-user] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--settings SETTINGS]
                  [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--idle-timeout IDLE_TIMEOUT] [--loader LOADER] [--cpu] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR]
                  [--load-in-8bit] [--bf16] [--no-cache] [--trust-remote-code] [--force-safetensors] [--no_use_fast] [--use_flash_attention_2] [--use_eager_attention] [--torch-compile] [--load-in-4bit]
                  [--use_double_quant] [--compute_dtype COMPUTE_DTYPE] [--quant_type QUANT_TYPE] [--flash-attn] [--threads THREADS] [--threads-batch THREADS_BATCH] [--batch-size BATCH_SIZE] [--no-mmap]
@@ -207,7 +211,6 @@ options:
 
 Basic settings:
   --multi-user                              Multi-user mode. Chat histories are not saved or automatically loaded. Warning: this is likely not safe for sharing publicly.
-  --character CHARACTER                     The name of the character to load in chat mode by default.
   --model MODEL                             Name of the model to load by default.
   --lora LORA [LORA ...]                    The list of LoRAs to load. If you want to load more than one LoRA, write the names separated by spaces.
   --model-dir MODEL_DIR                     Path to directory with all the models.
diff --git a/css/chat_style-wpp.css b/css/chat_style-wpp.css
index 353201c2..b2ac4d2a 100644
--- a/css/chat_style-wpp.css
+++ b/css/chat_style-wpp.css
@@ -1,57 +1,105 @@
 .message {
-    padding-bottom: 22px;
-    padding-top: 3px;
+    display: block;
+    padding-top: 0;
+    padding-bottom: 21px;
     font-size: 15px;
     font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
     line-height: 1.428571429;
+    grid-template-columns: none;
 }
 
-.text-you {
+.circle-you, .circle-bot {
+    display: none;
+}
+
+.text {
+    max-width: 65%;
+    border-radius: 18px;
+    padding: 12px 16px;
+    margin-bottom: 8px;
+    clear: both;
+    box-shadow: 0 1px 2px rgb(0 0 0 / 10%);
+}
+
+.username {
+    font-weight: 600;
+    margin-bottom: 8px;
+    opacity: 0.65;
+    padding-left: 0;
+}
+
+/* User messages - right aligned, WhatsApp green */
+.circle-you + .text {
     background-color: #d9fdd3;
-    border-radius: 15px;
-    padding: 10px;
-    padding-top: 5px;
     float: right;
+    margin-left: auto;
+    margin-right: 8px;
 }
 
-.text-bot {
-    background-color: #f2f2f2;
-    border-radius: 15px;
-    padding: 10px;
-    padding-top: 5px;
+.circle-you + .text .username {
+    display: none;
 }
 
-.dark .text-you {
-    background-color: #005c4b;
-    color: #111b21;
+/* Bot messages - left aligned, white */
+.circle-bot + .text {
+    background-color: #fff;
+    float: left;
+    margin-right: auto;
+    margin-left: 8px;
+    border: 1px solid #e5e5e5;
 }
 
-.dark .text-bot {
-    background-color: #1f2937;
-    color: #111b21;
+.circle-bot + .text .message-actions {
+    bottom: -25px !important;
 }
 
-.text-bot p, .text-you p {
-    margin-top: 5px;
+/* Dark theme colors */
+.dark .circle-you + .text {
+    background-color: #144d37;
+    color: #e4e6ea;
+    box-shadow: 0 1px 2px rgb(0 0 0 / 30%);
+}
+
+.dark .circle-bot + .text {
+    background-color: #202c33;
+    color: #e4e6ea;
+    border: 1px solid #3c4043;
+    box-shadow: 0 1px 2px rgb(0 0 0 / 30%);
+}
+
+.dark .username {
+    opacity: 0.7;
 }
 
 .message-body img {
     max-width: 300px;
     max-height: 300px;
-    border-radius: 20px;
+    border-radius: 12px;
 }
 
 .message-body p {
-    margin-bottom: 0 !important;
     font-size: 15px !important;
-    line-height: 1.428571429 !important;
-    font-weight: 500;
+    line-height: 1.4 !important;
+    font-weight: 400;
+}
+
+.message-body p:first-child {
+    margin-top: 0 !important;
 }
 
 .dark .message-body p em {
-    color: rgb(138 138 138) !important;
+    color: rgb(170 170 170) !important;
 }
 
 .message-body p em {
-    color: rgb(110 110 110) !important;
+    color: rgb(100 100 100) !important;
+}
+
+/* Message actions positioning */
+.message-actions {
+    margin-top: 8px;
+}
+
+.message-body p, .chat .message-body ul, .chat .message-body ol {
+    margin-bottom: 10px !important;
 }
diff --git a/css/main.css b/css/main.css
index a22fdd95..bc59f833 100644
--- a/css/main.css
+++ b/css/main.css
@@ -97,11 +97,11 @@ ol li p, ul li p {
     display: inline-block;
 }
 
-#chat-tab, #default-tab, #notebook-tab, #parameters, #chat-settings, #lora, #training-tab, #model-tab, #session-tab {
+#notebook-parent-tab, #chat-tab, #parameters, #chat-settings, #lora, #training-tab, #model-tab, #session-tab, #character-tab {
     border: 0;
 }
 
-#default-tab, #notebook-tab, #parameters, #chat-settings, #lora, #training-tab, #model-tab, #session-tab {
+#notebook-parent-tab, #parameters, #chat-settings, #lora, #training-tab, #model-tab, #session-tab, #character-tab {
     padding: 1rem;
 }
 
@@ -167,15 +167,15 @@ gradio-app > :first-child {
 }
 
 .textbox_default textarea {
-    height: calc(100dvh - 201px);
+    height: calc(100dvh - 202px);
 }
 
 .textbox_default_output textarea {
-    height: calc(100dvh - 117px);
+    height: calc(100dvh - 118px);
 }
 
 .textbox textarea {
-    height: calc(100dvh - 172px);
+    height: calc(100dvh - 145px)
 }
 
 .textbox_logits textarea {
@@ -307,7 +307,7 @@ audio {
 }
 
 #notebook-token-counter {
-    top: calc(100dvh - 171px) !important;
+    top: calc(100dvh - 172px) !important;
 }
 
 #default-token-counter span, #notebook-token-counter span {
@@ -421,6 +421,7 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
     text-align: start;
     padding-left: 1rem;
     padding-right: 1rem;
+    contain: layout;
 }
 
 .chat .message .timestamp {
@@ -905,6 +906,10 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
     flex-shrink: 1;
 }
 
+#search_chat {
+    padding-right: 0.5rem;
+}
+
 #search_chat > :nth-child(2) > :first-child {
     display: none;
 }
@@ -925,7 +930,7 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
     position: fixed;
     bottom: 0;
     left: 0;
-    width: calc(100vw / 2 - 600px);
+    width: calc(0.5 * (100vw - min(100vw, 48rem) - (120px - var(--header-width))));
     z-index: 10000;
 }
 
@@ -1020,12 +1025,14 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
     width: 100%;
     justify-content: center;
     gap: 9px;
+    padding-right: 0.5rem;
 }
 
 #past-chats-row,
 #chat-controls {
     width: 260px;
     padding: 0.5rem;
+    padding-right: 0;
     height: calc(100dvh - 16px);
     flex-shrink: 0;
     box-sizing: content-box;
@@ -1289,6 +1296,20 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
     opacity: 1;
 }
 
+/* Disable message action hover effects during generation */
+._generating .message:hover .message-actions,
+._generating .user-message:hover .message-actions,
+._generating .assistant-message:hover .message-actions {
+    opacity: 0 !important;
+}
+
+/* Disable message action hover effects during scrolling */
+.scrolling .message:hover .message-actions,
+.scrolling .user-message:hover .message-actions,
+.scrolling .assistant-message:hover .message-actions {
+    opacity: 0 !important;
+}
+
 .footer-button svg {
     stroke: rgb(156 163 175);
     transition: stroke 0.2s;
@@ -1625,7 +1646,27 @@ button:focus {
     display: none;
 }
 
-/* Disable hover effects while scrolling */
-.chat-parent.scrolling * {
-    pointer-events: none !important;
+#character-context textarea {
+    height: calc((100vh - 350px) * 2/3) !important;
+    min-height: 90px !important;
+}
+
+#character-greeting textarea {
+    height: calc((100vh - 350px) * 1/3) !important;
+    min-height: 90px !important;
+}
+
+#user-description textarea {
+    height: calc(100vh - 231px) !important;
+    min-height: 90px !important;
+}
+
+#instruction-template-str textarea,
+#chat-template-str textarea {
+    height: calc(100vh - 300px) !important;
+    min-height: 90px !important;
+}
+
+#textbox-notebook span {
+    display: none;
 }
diff --git a/docs/12 - OpenAI API.md b/docs/12 - OpenAI API.md
index db9befed..ec999397 100644
--- a/docs/12 - OpenAI API.md	
+++ b/docs/12 - OpenAI API.md	
@@ -1,6 +1,6 @@
 ## OpenAI compatible API
 
-The main API for this project is meant to be a drop-in replacement to the OpenAI API, including Chat and Completions endpoints. 
+The main API for this project is meant to be a drop-in replacement to the OpenAI API, including Chat and Completions endpoints.
 
 * It is 100% offline and private.
 * It doesn't create any logs.
@@ -30,10 +30,10 @@ curl http://127.0.0.1:5000/v1/completions \
   -H "Content-Type: application/json" \
   -d '{
     "prompt": "This is a cake recipe:\n\n1.",
-    "max_tokens": 200,
-    "temperature": 1,
-    "top_p": 0.9,
-    "seed": 10
+    "max_tokens": 512,
+    "temperature": 0.6,
+    "top_p": 0.95,
+    "top_k": 20
   }'
 ```
 
@@ -51,7 +51,9 @@ curl http://127.0.0.1:5000/v1/chat/completions \
         "content": "Hello!"
       }
     ],
-    "mode": "instruct"
+    "temperature": 0.6,
+    "top_p": 0.95,
+    "top_k": 20
   }'
 ```
 
@@ -67,8 +69,11 @@ curl http://127.0.0.1:5000/v1/chat/completions \
         "content": "Hello! Who are you?"
       }
     ],
-    "mode": "chat",
-    "character": "Example"
+    "mode": "chat-instruct",
+    "character": "Example",
+    "temperature": 0.6,
+    "top_p": 0.95,
+    "top_k": 20
   }'
 ```
 
@@ -84,7 +89,9 @@ curl http://127.0.0.1:5000/v1/chat/completions \
         "content": "Hello!"
       }
     ],
-    "mode": "instruct",
+    "temperature": 0.6,
+    "top_p": 0.95,
+    "top_k": 20,
     "stream": true
   }'
 ```
@@ -125,10 +132,11 @@ curl -k http://127.0.0.1:5000/v1/internal/model/list \
 curl -k http://127.0.0.1:5000/v1/internal/model/load \
   -H "Content-Type: application/json" \
   -d '{
-    "model_name": "model_name",
+    "model_name": "Qwen_Qwen3-0.6B-Q4_K_M.gguf",
     "args": {
-      "load_in_4bit": true,
-      "n_gpu_layers": 12
+      "ctx_size": 32768,
+      "flash_attn": true,
+      "cache_type": "q8_0"
     }
   }'
 ```
@@ -150,9 +158,10 @@ while True:
     user_message = input("> ")
     history.append({"role": "user", "content": user_message})
     data = {
-        "mode": "chat",
-        "character": "Example",
-        "messages": history
+        "messages": history,
+        "temperature": 0.6,
+        "top_p": 0.95,
+        "top_k": 20
     }
 
     response = requests.post(url, headers=headers, json=data, verify=False)
@@ -182,9 +191,11 @@ while True:
     user_message = input("> ")
     history.append({"role": "user", "content": user_message})
     data = {
-        "mode": "instruct",
         "stream": True,
-        "messages": history
+        "messages": history,
+        "temperature": 0.6,
+        "top_p": 0.95,
+        "top_k": 20
     }
 
     stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
@@ -218,10 +229,10 @@ headers = {
 
 data = {
     "prompt": "This is a cake recipe:\n\n1.",
-    "max_tokens": 200,
-    "temperature": 1,
-    "top_p": 0.9,
-    "seed": 10,
+    "max_tokens": 512,
+    "temperature": 0.6,
+    "top_p": 0.95,
+    "top_k": 20,
     "stream": True,
 }
 
diff --git a/extensions/openai/models.py b/extensions/openai/models.py
index a7e67df6..f8d9a1e8 100644
--- a/extensions/openai/models.py
+++ b/extensions/openai/models.py
@@ -18,19 +18,6 @@ def list_models():
     return {'model_names': get_available_models()[1:]}
 
 
-def list_dummy_models():
-    result = {
-        "object": "list",
-        "data": []
-    }
-
-    # these are expected by so much, so include some here as a dummy
-    for model in ['gpt-3.5-turbo', 'text-embedding-ada-002']:
-        result["data"].append(model_info_dict(model))
-
-    return result
-
-
 def model_info_dict(model_name: str) -> dict:
     return {
         "id": model_name,
diff --git a/extensions/openai/script.py b/extensions/openai/script.py
index 24bcd69d..3d8d5f73 100644
--- a/extensions/openai/script.py
+++ b/extensions/openai/script.py
@@ -180,7 +180,7 @@ async def handle_models(request: Request):
     is_list = request.url.path.split('?')[0].split('#')[0] == '/v1/models'
 
     if is_list:
-        response = OAImodels.list_dummy_models()
+        response = OAImodels.list_models()
     else:
         model_name = path[len('/v1/models/'):]
         response = OAImodels.model_info_dict(model_name)
diff --git a/extensions/openai/typing.py b/extensions/openai/typing.py
index b28ebb4e..6643ed16 100644
--- a/extensions/openai/typing.py
+++ b/extensions/openai/typing.py
@@ -158,7 +158,7 @@ class ChatCompletionRequestParams(BaseModel):
     user_bio: str | None = Field(default=None, description="The user description/personality.")
     chat_template_str: str | None = Field(default=None, description="Jinja2 template for chat.")
 
-    chat_instruct_command: str | None = None
+    chat_instruct_command: str | None = "Continue the chat dialogue below. Write a single reply for the character \"<|character|>\".\n\n<|prompt|>"
 
     continue_: bool = Field(default=False, description="Makes the last bot message in the history be continued instead of starting a new message.")
 
diff --git a/js/main.js b/js/main.js
index e970884d..3ff4bf06 100644
--- a/js/main.js
+++ b/js/main.js
@@ -170,6 +170,13 @@ targetElement.addEventListener("scroll", function() {
 
 // Create a MutationObserver instance
 const observer = new MutationObserver(function(mutations) {
+  // Check if this is just the scrolling class being toggled
+  const isScrollingClassOnly = mutations.every(mutation =>
+    mutation.type === "attributes" &&
+    mutation.attributeName === "class" &&
+    mutation.target === targetElement
+  );
+
   if (targetElement.classList.contains("_generating")) {
     typing.parentNode.classList.add("visible-dots");
     document.getElementById("stop").style.display = "flex";
@@ -182,7 +189,7 @@ const observer = new MutationObserver(function(mutations) {
 
   doSyntaxHighlighting();
 
-  if (!window.isScrolled && targetElement.scrollTop !== targetElement.scrollHeight) {
+  if (!window.isScrolled && !isScrollingClassOnly && targetElement.scrollTop !== targetElement.scrollHeight) {
     targetElement.scrollTop = targetElement.scrollHeight;
   }
 
@@ -231,8 +238,15 @@ function doSyntaxHighlighting() {
   if (messageBodies.length > 0) {
     observer.disconnect();
 
-    messageBodies.forEach((messageBody) => {
+    let hasSeenVisible = false;
+
+    // Go from last message to first
+    for (let i = messageBodies.length - 1; i >= 0; i--) {
+      const messageBody = messageBodies[i];
+
       if (isElementVisibleOnScreen(messageBody)) {
+        hasSeenVisible = true;
+
         // Handle both code and math in a single pass through each message
         const codeBlocks = messageBody.querySelectorAll("pre code:not([data-highlighted])");
         codeBlocks.forEach((codeBlock) => {
@@ -249,8 +263,12 @@ function doSyntaxHighlighting() {
             { left: "\\[", right: "\\]", display: true },
           ],
         });
+      } else if (hasSeenVisible) {
+        // We've seen visible messages but this one is not visible
+        // Since we're going from last to first, we can break
+        break;
       }
-    });
+    }
 
     observer.observe(targetElement, config);
   }
@@ -777,11 +795,43 @@ initializeSidebars();
 
 // Add click event listeners to toggle buttons
 pastChatsToggle.addEventListener("click", () => {
+  const isCurrentlyOpen = !pastChatsRow.classList.contains("sidebar-hidden");
   toggleSidebar(pastChatsRow, pastChatsToggle);
+
+  // On desktop, open/close both sidebars at the same time
+  if (!isMobile()) {
+    if (isCurrentlyOpen) {
+      // If we just closed the left sidebar, also close the right sidebar
+      if (!chatControlsRow.classList.contains("sidebar-hidden")) {
+        toggleSidebar(chatControlsRow, chatControlsToggle, true);
+      }
+    } else {
+      // If we just opened the left sidebar, also open the right sidebar
+      if (chatControlsRow.classList.contains("sidebar-hidden")) {
+        toggleSidebar(chatControlsRow, chatControlsToggle, false);
+      }
+    }
+  }
 });
 
 chatControlsToggle.addEventListener("click", () => {
+  const isCurrentlyOpen = !chatControlsRow.classList.contains("sidebar-hidden");
   toggleSidebar(chatControlsRow, chatControlsToggle);
+
+  // On desktop, open/close both sidebars at the same time
+  if (!isMobile()) {
+    if (isCurrentlyOpen) {
+      // If we just closed the right sidebar, also close the left sidebar
+      if (!pastChatsRow.classList.contains("sidebar-hidden")) {
+        toggleSidebar(pastChatsRow, pastChatsToggle, true);
+      }
+    } else {
+      // If we just opened the right sidebar, also open the left sidebar
+      if (pastChatsRow.classList.contains("sidebar-hidden")) {
+        toggleSidebar(pastChatsRow, pastChatsToggle, false);
+      }
+    }
+  }
 });
 
 navigationToggle.addEventListener("click", () => {
diff --git a/js/show_controls.js b/js/show_controls.js
index 1a87b52d..f974d412 100644
--- a/js/show_controls.js
+++ b/js/show_controls.js
@@ -1,14 +1,26 @@
-const belowChatInput = document.querySelectorAll(
-  "#chat-tab > div > :nth-child(1), #chat-tab > div > :nth-child(3), #chat-tab > div > :nth-child(4), #extensions"
-);
 const chatParent = document.querySelector(".chat-parent");
 
 function toggle_controls(value) {
-  if (value) {
-    belowChatInput.forEach(element => {
-      element.style.display = "inherit";
-    });
+  const extensions = document.querySelector("#extensions");
 
+  if (value) {
+    // SHOW MODE: Click toggles to show hidden sidebars
+    const navToggle = document.getElementById("navigation-toggle");
+    const pastChatsToggle = document.getElementById("past-chats-toggle");
+
+    if (navToggle && document.querySelector(".header_bar")?.classList.contains("sidebar-hidden")) {
+      navToggle.click();
+    }
+    if (pastChatsToggle && document.getElementById("past-chats-row")?.classList.contains("sidebar-hidden")) {
+      pastChatsToggle.click();
+    }
+
+    // Show extensions only
+    if (extensions) {
+      extensions.style.display = "inherit";
+    }
+
+    // Remove bigchat classes
     chatParent.classList.remove("bigchat");
     document.getElementById("chat-input-row").classList.remove("bigchat");
     document.getElementById("chat-col").classList.remove("bigchat");
@@ -20,10 +32,23 @@ function toggle_controls(value) {
     }
 
   } else {
-    belowChatInput.forEach(element => {
-      element.style.display = "none";
-    });
+    // HIDE MODE: Click toggles to hide visible sidebars
+    const navToggle = document.getElementById("navigation-toggle");
+    const pastChatsToggle = document.getElementById("past-chats-toggle");
 
+    if (navToggle && !document.querySelector(".header_bar")?.classList.contains("sidebar-hidden")) {
+      navToggle.click();
+    }
+    if (pastChatsToggle && !document.getElementById("past-chats-row")?.classList.contains("sidebar-hidden")) {
+      pastChatsToggle.click();
+    }
+
+    // Hide extensions only
+    if (extensions) {
+      extensions.style.display = "none";
+    }
+
+    // Add bigchat classes
     chatParent.classList.add("bigchat");
     document.getElementById("chat-input-row").classList.add("bigchat");
     document.getElementById("chat-col").classList.add("bigchat");
diff --git a/js/switch_tabs.js b/js/switch_tabs.js
index 0564f891..7fb78aea 100644
--- a/js/switch_tabs.js
+++ b/js/switch_tabs.js
@@ -1,24 +1,14 @@
-let chat_tab = document.getElementById("chat-tab");
-let main_parent = chat_tab.parentNode;
-
 function scrollToTop() {
-  window.scrollTo({
-    top: 0,
-    // behavior: 'smooth'
-  });
+  window.scrollTo({ top: 0 });
 }
 
 function findButtonsByText(buttonText) {
   const buttons = document.getElementsByTagName("button");
   const matchingButtons = [];
-  buttonText = buttonText.trim();
 
   for (let i = 0; i < buttons.length; i++) {
-    const button = buttons[i];
-    const buttonInnerText = button.textContent.trim();
-
-    if (buttonInnerText === buttonText) {
-      matchingButtons.push(button);
+    if (buttons[i].textContent.trim() === buttonText) {
+      matchingButtons.push(buttons[i]);
     }
   }
 
@@ -26,34 +16,23 @@ function findButtonsByText(buttonText) {
 }
 
 function switch_to_chat() {
-  let chat_tab_button = main_parent.childNodes[0].childNodes[1];
-  chat_tab_button.click();
-  scrollToTop();
-}
-
-function switch_to_default() {
-  let default_tab_button = main_parent.childNodes[0].childNodes[5];
-  default_tab_button.click();
+  document.getElementById("chat-tab-button").click();
   scrollToTop();
 }
 
 function switch_to_notebook() {
-  let notebook_tab_button = main_parent.childNodes[0].childNodes[9];
-  notebook_tab_button.click();
+  document.getElementById("notebook-parent-tab-button").click();
   findButtonsByText("Raw")[1].click();
   scrollToTop();
 }
 
 function switch_to_generation_parameters() {
-  let parameters_tab_button = main_parent.childNodes[0].childNodes[13];
-  parameters_tab_button.click();
+  document.getElementById("parameters-button").click();
   findButtonsByText("Generation")[0].click();
   scrollToTop();
 }
 
 function switch_to_character() {
-  let parameters_tab_button = main_parent.childNodes[0].childNodes[13];
-  parameters_tab_button.click();
-  findButtonsByText("Character")[0].click();
+  document.getElementById("character-tab-button").click();
   scrollToTop();
 }
diff --git a/modules/chat.py b/modules/chat.py
index dfc301df..9290dd62 100644
--- a/modules/chat.py
+++ b/modules/chat.py
@@ -217,8 +217,8 @@ def generate_chat_prompt(user_input, state, **kwargs):
             user_key = f"user_{row_idx}"
             enhanced_user_msg = user_msg
 
-            # Add attachment content if present
-            if user_key in metadata and "attachments" in metadata[user_key]:
+            # Add attachment content if present AND if past attachments are enabled
+            if (state.get('include_past_attachments', True) and user_key in metadata and "attachments" in metadata[user_key]):
                 attachments_text = ""
                 for attachment in metadata[user_key]["attachments"]:
                     filename = attachment.get("name", "file")
@@ -332,10 +332,10 @@ def generate_chat_prompt(user_input, state, **kwargs):
                 user_message = messages[-1]['content']
 
                 # Bisect the truncation point
-                left, right = 0, len(user_message) - 1
+                left, right = 0, len(user_message)
 
-                while right - left > 1:
-                    mid = (left + right) // 2
+                while left < right:
+                    mid = (left + right + 1) // 2
 
                     messages[-1]['content'] = user_message[:mid]
                     prompt = make_prompt(messages)
@@ -344,7 +344,7 @@ def generate_chat_prompt(user_input, state, **kwargs):
                     if encoded_length <= max_length:
                         left = mid
                     else:
-                        right = mid
+                        right = mid - 1
 
                 messages[-1]['content'] = user_message[:left]
                 prompt = make_prompt(messages)
@@ -353,7 +353,17 @@ def generate_chat_prompt(user_input, state, **kwargs):
                     logger.error(f"Failed to build the chat prompt. The input is too long for the available context length.\n\nTruncation length: {state['truncation_length']}\nmax_new_tokens: {state['max_new_tokens']} (is it too high?)\nAvailable context length: {max_length}\n")
                     raise ValueError
                 else:
-                    logger.warning(f"The input has been truncated. Context length: {state['truncation_length']}, max_new_tokens: {state['max_new_tokens']}, available context length: {max_length}.")
+                    # Calculate token counts for the log message
+                    original_user_tokens = get_encoded_length(user_message)
+                    truncated_user_tokens = get_encoded_length(user_message[:left])
+                    total_context = max_length + state['max_new_tokens']
+
+                    logger.warning(
+                        f"User message truncated from {original_user_tokens} to {truncated_user_tokens} tokens. "
+                        f"Context full: {max_length} input tokens ({total_context} total, {state['max_new_tokens']} for output). "
+                        f"Increase ctx-size while loading the model to avoid truncation."
+                    )
+
                     break
 
             prompt = make_prompt(messages)
@@ -604,6 +614,7 @@ def generate_search_query(user_message, state):
     search_state['max_new_tokens'] = 64
     search_state['auto_max_new_tokens'] = False
     search_state['enable_thinking'] = False
+    search_state['start_with'] = ""
 
     # Generate the full prompt using existing history + augmented message
     formatted_prompt = generate_chat_prompt(augmented_message, search_state)
@@ -1069,16 +1080,27 @@ def load_latest_history(state):
     '''
 
     if shared.args.multi_user:
-        return start_new_chat(state)
+        return start_new_chat(state), None
 
     histories = find_all_histories(state)
 
     if len(histories) > 0:
-        history = load_history(histories[0], state['character_menu'], state['mode'])
-    else:
-        history = start_new_chat(state)
+        # Try to load the last visited chat for this character/mode
+        chat_state = load_last_chat_state()
+        key = get_chat_state_key(state['character_menu'], state['mode'])
+        last_chat_id = chat_state.get("last_chats", {}).get(key)
 
-    return history
+        # If we have a stored last chat and it still exists, use it
+        if last_chat_id and last_chat_id in histories:
+            unique_id = last_chat_id
+        else:
+            # Fall back to most recent (current behavior)
+            unique_id = histories[0]
+
+        history = load_history(unique_id, state['character_menu'], state['mode'])
+        return history, unique_id
+    else:
+        return start_new_chat(state), None
 
 
 def load_history_after_deletion(state, idx):
@@ -1110,6 +1132,42 @@ def update_character_menu_after_deletion(idx):
     return gr.update(choices=characters, value=characters[idx])
 
 
+def get_chat_state_key(character, mode):
+    """Generate a key for storing last chat state"""
+    if mode == 'instruct':
+        return 'instruct'
+    else:
+        return f"chat_{character}"
+
+
+def load_last_chat_state():
+    """Load the last chat state from file"""
+    state_file = Path('user_data/logs/chat_state.json')
+    if state_file.exists():
+        try:
+            with open(state_file, 'r', encoding='utf-8') as f:
+                return json.loads(f.read())
+        except:
+            pass
+
+    return {"last_chats": {}}
+
+
+def save_last_chat_state(character, mode, unique_id):
+    """Save the last visited chat for a character/mode"""
+    if shared.args.multi_user:
+        return
+
+    state = load_last_chat_state()
+    key = get_chat_state_key(character, mode)
+    state["last_chats"][key] = unique_id
+
+    state_file = Path('user_data/logs/chat_state.json')
+    state_file.parent.mkdir(exist_ok=True)
+    with open(state_file, 'w', encoding='utf-8') as f:
+        f.write(json.dumps(state, indent=2))
+
+
 def load_history(unique_id, character, mode):
     p = get_history_file_path(unique_id, character, mode)
 
@@ -1543,6 +1601,9 @@ def handle_unique_id_select(state):
     history = load_history(state['unique_id'], state['character_menu'], state['mode'])
     html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
 
+    # Save this as the last visited chat
+    save_last_chat_state(state['character_menu'], state['mode'], state['unique_id'])
+
     convert_to_markdown.cache_clear()
 
     return [history, html]
@@ -1743,14 +1804,14 @@ def handle_character_menu_change(state):
     state['greeting'] = greeting
     state['context'] = context
 
-    history = load_latest_history(state)
+    history, loaded_unique_id = load_latest_history(state)
     histories = find_all_histories_with_first_prompts(state)
     html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
 
     convert_to_markdown.cache_clear()
 
     if len(histories) > 0:
-        past_chats_update = gr.update(choices=histories, value=histories[0][1])
+        past_chats_update = gr.update(choices=histories, value=loaded_unique_id or histories[0][1])
     else:
         past_chats_update = gr.update(choices=histories)
 
@@ -1762,7 +1823,7 @@ def handle_character_menu_change(state):
         picture,
         greeting,
         context,
-        past_chats_update,
+        past_chats_update
     ]
 
 
@@ -1786,14 +1847,19 @@ def handle_character_picture_change(picture):
 
 
 def handle_mode_change(state):
-    history = load_latest_history(state)
+    history, loaded_unique_id = load_latest_history(state)
     histories = find_all_histories_with_first_prompts(state)
+
+    # Ensure character picture cache exists
+    if state['mode'] in ['chat', 'chat-instruct'] and state['character_menu'] and state['character_menu'] != 'None':
+        generate_pfp_cache(state['character_menu'])
+
     html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
 
     convert_to_markdown.cache_clear()
 
     if len(histories) > 0:
-        past_chats_update = gr.update(choices=histories, value=histories[0][1])
+        past_chats_update = gr.update(choices=histories, value=loaded_unique_id or histories[0][1])
     else:
         past_chats_update = gr.update(choices=histories)
 
@@ -1852,10 +1918,16 @@ def handle_send_instruction_click(state):
 
     output = generate_chat_prompt("Input", state)
 
-    return output
+    if state["show_two_notebook_columns"]:
+        return gr.update(), output, ""
+    else:
+        return output, gr.update(), gr.update()
 
 
 def handle_send_chat_click(state):
     output = generate_chat_prompt("", state, _continue=True)
 
-    return output
+    if state["show_two_notebook_columns"]:
+        return gr.update(), output, ""
+    else:
+        return output, gr.update(), gr.update()
diff --git a/modules/html_generator.py b/modules/html_generator.py
index af64894e..11572fc6 100644
--- a/modules/html_generator.py
+++ b/modules/html_generator.py
@@ -595,64 +595,6 @@ def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=
     return output
 
 
-def generate_chat_html(history, name1, name2, reset_cache=False, last_message_only=False):
-    if not last_message_only:
-        output = f'<style>{chat_styles["wpp"]}</style><div class="chat" id="chat"><div class="messages">'
-    else:
-        output = ""
-
-    def create_message(role, content, raw_content):
-        """Inner function for WPP-style messages."""
-        text_class = "text-you" if role == "user" else "text-bot"
-
-        # Get role-specific data
-        timestamp = format_message_timestamp(history, role, i)
-        attachments = format_message_attachments(history, role, i)
-
-        # Create info button if timestamp exists
-        info_message = ""
-        if timestamp:
-            tooltip_text = get_message_tooltip(history, role, i)
-            info_message = info_button.replace('title="message"', f'title="{html.escape(tooltip_text)}"')
-
-        return (
-            f'<div class="message" '
-            f'data-raw="{html.escape(raw_content, quote=True)}"'
-            f'data-index={i}>'
-            f'<div class="{text_class}">'
-            f'<div class="message-body">{content}</div>'
-            f'{attachments}'
-            f'{actions_html(history, i, role, info_message)}'
-            f'</div>'
-            f'</div>'
-        )
-
-    # Determine range
-    start_idx = len(history['visible']) - 1 if last_message_only else 0
-    end_idx = len(history['visible'])
-
-    for i in range(start_idx, end_idx):
-        row_visible = history['visible'][i]
-        row_internal = history['internal'][i]
-
-        # Convert content
-        if last_message_only:
-            converted_visible = [None, convert_to_markdown_wrapped(row_visible[1], message_id=i, use_cache=i != len(history['visible']) - 1)]
-        else:
-            converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
-
-        # Generate messages
-        if not last_message_only and converted_visible[0]:
-            output += create_message("user", converted_visible[0], row_internal[0])
-
-        output += create_message("assistant", converted_visible[1], row_internal[1])
-
-    if not last_message_only:
-        output += "</div></div>"
-
-    return output
-
-
 def time_greeting():
     current_hour = datetime.datetime.now().hour
     if 5 <= current_hour < 12:
@@ -669,8 +611,6 @@ def chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache
         result = f'<div class="chat" id="chat">{greeting}</div>'
     elif mode == 'instruct':
         result = generate_instruct_html(history, last_message_only=last_message_only)
-    elif style == 'wpp':
-        result = generate_chat_html(history, name1, name2, last_message_only=last_message_only)
     else:
         result = generate_cai_chat_html(history, name1, name2, style, character, reset_cache=reset_cache, last_message_only=last_message_only)
 
diff --git a/modules/llama_cpp_server.py b/modules/llama_cpp_server.py
index a79e24e4..e64f1694 100644
--- a/modules/llama_cpp_server.py
+++ b/modules/llama_cpp_server.py
@@ -30,6 +30,7 @@ class LlamaServer:
         self.session = requests.Session()
         self.vocabulary_size = None
         self.bos_token = "<s>"
+        self.last_prompt_token_count = 0
 
         # Start the server
         self._start_server()
@@ -128,6 +129,7 @@ class LlamaServer:
         payload = self.prepare_payload(state)
 
         token_ids = self.encode(prompt, add_bos_token=state["add_bos_token"])
+        self.last_prompt_token_count = len(token_ids)
         if state['auto_max_new_tokens']:
             max_new_tokens = state['truncation_length'] - len(token_ids)
         else:
diff --git a/modules/models_settings.py b/modules/models_settings.py
index 283a9744..37aa37cf 100644
--- a/modules/models_settings.py
+++ b/modules/models_settings.py
@@ -9,6 +9,7 @@ import gradio as gr
 import yaml
 
 from modules import chat, loaders, metadata_gguf, shared, ui
+from modules.logging_colors import logger
 
 
 def get_fallback_settings():
@@ -56,7 +57,13 @@ def get_model_metadata(model):
         if path.is_file():
             model_file = path
         else:
-            model_file = list(path.glob('*.gguf'))[0]
+            gguf_files = list(path.glob('*.gguf'))
+            if not gguf_files:
+                error_msg = f"No .gguf models found in directory: {path}"
+                logger.error(error_msg)
+                raise FileNotFoundError(error_msg)
+
+            model_file = gguf_files[0]
 
         metadata = load_gguf_metadata_with_cache(model_file)
 
@@ -171,6 +178,8 @@ def infer_loader(model_name, model_settings, hf_quant_method=None):
     path_to_model = Path(f'{shared.args.model_dir}/{model_name}')
     if not path_to_model.exists():
         loader = None
+    elif shared.args.portable:
+        loader = 'llama.cpp'
     elif len(list(path_to_model.glob('*.gguf'))) > 0:
         loader = 'llama.cpp'
     elif re.match(r'.*\.gguf', model_name.lower()):
@@ -450,26 +459,19 @@ def update_gpu_layers_and_vram(loader, model, gpu_layers, ctx_size, cache_type,
         else:
             return (0, gpu_layers) if auto_adjust else 0
 
+    # Get model settings including user preferences
+    model_settings = get_model_metadata(model)
+
     current_layers = gpu_layers
-    max_layers = gpu_layers
+    max_layers = model_settings.get('max_gpu_layers', 256)
 
     if auto_adjust:
-        # Get model settings including user preferences
-        model_settings = get_model_metadata(model)
-
-        # Get the true maximum layers
-        max_layers = model_settings.get('max_gpu_layers', model_settings.get('gpu_layers', gpu_layers))
-
         # Check if this is a user-saved setting
         user_config = shared.user_config
         model_regex = Path(model).name + '$'
         has_user_setting = model_regex in user_config and 'gpu_layers' in user_config[model_regex]
 
-        if has_user_setting:
-            # For user settings, just use the current value (which already has user pref)
-            # but ensure the slider maximum is correct
-            current_layers = gpu_layers  # Already has user setting
-        else:
+        if not has_user_setting:
             # No user setting, auto-adjust from the maximum
             current_layers = max_layers  # Start from max
 
diff --git a/modules/prompts.py b/modules/prompts.py
index 8f00cac2..79d9b56e 100644
--- a/modules/prompts.py
+++ b/modules/prompts.py
@@ -1,22 +1,33 @@
 from pathlib import Path
 
+from modules import shared, utils
 from modules.text_generation import get_encoded_length
 
 
 def load_prompt(fname):
-    if fname in ['None', '']:
-        return ''
-    else:
-        file_path = Path(f'user_data/prompts/{fname}.txt')
-        if not file_path.exists():
-            return ''
+    if not fname:
+        # Create new file
+        new_name = utils.current_time()
+        prompt_path = Path("user_data/logs/notebook") / f"{new_name}.txt"
+        prompt_path.parent.mkdir(parents=True, exist_ok=True)
+        initial_content = "In this story,"
+        prompt_path.write_text(initial_content, encoding='utf-8')
 
+        # Update settings to point to new file
+        shared.settings['prompt-notebook'] = new_name
+
+        return initial_content
+
+    file_path = Path(f'user_data/logs/notebook/{fname}.txt')
+    if file_path.exists():
         with open(file_path, 'r', encoding='utf-8') as f:
             text = f.read()
-            if text[-1] == '\n':
+            if len(text) > 0 and text[-1] == '\n':
                 text = text[:-1]
 
             return text
+    else:
+        return ''
 
 
 def count_tokens(text):
diff --git a/modules/shared.py b/modules/shared.py
index 83920df8..5333ec4f 100644
--- a/modules/shared.py
+++ b/modules/shared.py
@@ -202,8 +202,7 @@ settings = {
     'chat-instruct_command': 'Continue the chat dialogue below. Write a single reply for the character "<|character|>".\n\n<|prompt|>',
     'enable_web_search': False,
     'web_search_pages': 3,
-    'prompt-default': 'QA',
-    'prompt-notebook': 'QA',
+    'prompt-notebook': '',
     'preset': 'Qwen3 - Thinking' if Path('user_data/presets/Qwen3 - Thinking.yaml').exists() else None,
     'max_new_tokens': 512,
     'max_new_tokens_min': 1,
@@ -223,7 +222,9 @@ settings = {
     'custom_token_bans': '',
     'negative_prompt': '',
     'dark_theme': True,
+    'show_two_notebook_columns': False,
     'paste_to_attachment': False,
+    'include_past_attachments': True,
 
     # Generation parameters - Curve shape
     'temperature': 0.6,
diff --git a/modules/text_generation.py b/modules/text_generation.py
index 55b538b0..a75141f1 100644
--- a/modules/text_generation.py
+++ b/modules/text_generation.py
@@ -498,8 +498,14 @@ def generate_reply_custom(question, original_question, state, stopping_strings=N
         traceback.print_exc()
     finally:
         t1 = time.time()
-        original_tokens = len(encode(original_question)[0])
-        new_tokens = len(encode(original_question + reply)[0]) - original_tokens
+
+        if hasattr(shared.model, 'last_prompt_token_count'):
+            original_tokens = shared.model.last_prompt_token_count
+            new_tokens = len(encode(reply)[0]) if reply else 0
+        else:
+            original_tokens = len(encode(original_question)[0])
+            new_tokens = len(encode(original_question + reply)[0]) - original_tokens
+
         logger.info(f'Output generated in {(t1-t0):.2f} seconds ({new_tokens/(t1-t0):.2f} tokens/s, {new_tokens} tokens, context {original_tokens}, seed {state["seed"]})')
         return
 
diff --git a/modules/ui.py b/modules/ui.py
index 2925faa5..0e8afa8f 100644
--- a/modules/ui.py
+++ b/modules/ui.py
@@ -6,6 +6,7 @@ import gradio as gr
 import yaml
 
 import extensions
+import modules.extensions as extensions_module
 from modules import shared
 from modules.chat import load_history
 from modules.utils import gradio
@@ -273,7 +274,9 @@ def list_interface_input_elements():
 
     # Other elements
     elements += [
-        'paste_to_attachment'
+        'show_two_notebook_columns',
+        'paste_to_attachment',
+        'include_past_attachments',
     ]
 
     return elements
@@ -324,8 +327,7 @@ def save_settings(state, preset, extensions_list, show_controls, theme_state, ma
             output[k] = state[k]
 
     output['preset'] = preset
-    output['prompt-default'] = state['prompt_menu-default']
-    output['prompt-notebook'] = state['prompt_menu-notebook']
+    output['prompt-notebook'] = state['prompt_menu-default'] if state['show_two_notebook_columns'] else state['prompt_menu-notebook']
     output['character'] = state['character_menu']
     output['seed'] = int(output['seed'])
     output['show_controls'] = show_controls
@@ -333,35 +335,41 @@ def save_settings(state, preset, extensions_list, show_controls, theme_state, ma
     output.pop('instruction_template_str')
     output.pop('truncation_length')
 
-    # Only save extensions on manual save
+    # Handle extensions and extension parameters
     if manual_save:
+        # Save current extensions and their parameter values
         output['default_extensions'] = extensions_list
+
+        for extension_name in extensions_list:
+            extension = getattr(extensions, extension_name, None)
+            if extension:
+                extension = extension.script
+                if hasattr(extension, 'params'):
+                    params = getattr(extension, 'params')
+                    for param in params:
+                        _id = f"{extension_name}-{param}"
+                        # Only save if different from default value
+                        if param not in shared.default_settings or params[param] != shared.default_settings[param]:
+                            output[_id] = params[param]
     else:
-        # Preserve existing extensions from settings file during autosave
+        # Preserve existing extensions and extension parameters during autosave
         settings_path = Path('user_data') / 'settings.yaml'
         if settings_path.exists():
             try:
                 with open(settings_path, 'r', encoding='utf-8') as f:
                     existing_settings = yaml.safe_load(f.read()) or {}
 
+                # Preserve default_extensions
                 if 'default_extensions' in existing_settings:
                     output['default_extensions'] = existing_settings['default_extensions']
+
+                # Preserve extension parameter values
+                for key, value in existing_settings.items():
+                    if any(key.startswith(f"{ext_name}-") for ext_name in extensions_module.available_extensions):
+                        output[key] = value
             except Exception:
                 pass  # If we can't read the file, just don't modify extensions
 
-    # Save extension values in the UI
-    for extension_name in extensions_list:
-        extension = getattr(extensions, extension_name, None)
-        if extension:
-            extension = extension.script
-            if hasattr(extension, 'params'):
-                params = getattr(extension, 'params')
-                for param in params:
-                    _id = f"{extension_name}-{param}"
-                    # Only save if different from default value
-                    if param not in shared.default_settings or params[param] != shared.default_settings[param]:
-                        output[_id] = params[param]
-
     # Do not save unchanged settings
     for key in list(output.keys()):
         if key in shared.default_settings and output[key] == shared.default_settings[key]:
@@ -497,7 +505,9 @@ def setup_auto_save():
         # Session tab (ui_session.py)
         'show_controls',
         'theme_state',
-        'paste_to_attachment'
+        'show_two_notebook_columns',
+        'paste_to_attachment',
+        'include_past_attachments'
     ]
 
     for element_name in change_elements:
diff --git a/modules/ui_chat.py b/modules/ui_chat.py
index 3b841b8b..8a90608f 100644
--- a/modules/ui_chat.py
+++ b/modules/ui_chat.py
@@ -70,7 +70,6 @@ def create_ui():
             shared.gradio['Impersonate'] = gr.Button('Impersonate (Ctrl + Shift + M)', elem_id='Impersonate')
             shared.gradio['Send dummy message'] = gr.Button('Send dummy message')
             shared.gradio['Send dummy reply'] = gr.Button('Send dummy reply')
-            shared.gradio['send-chat-to-default'] = gr.Button('Send to Default')
             shared.gradio['send-chat-to-notebook'] = gr.Button('Send to Notebook')
             shared.gradio['show_controls'] = gr.Checkbox(value=shared.settings['show_controls'], label='Show controls (Ctrl+S)', elem_id='show-controls')
 
@@ -111,9 +110,9 @@ def create_ui():
             shared.gradio['edit_message'] = gr.Button(elem_id="Edit-message")
 
 
-def create_chat_settings_ui():
+def create_character_settings_ui():
     mu = shared.args.multi_user
-    with gr.Tab('Chat'):
+    with gr.Tab('Character', elem_id="character-tab"):
         with gr.Row():
             with gr.Column(scale=8):
                 with gr.Tab("Character"):
@@ -125,12 +124,12 @@ def create_chat_settings_ui():
                         shared.gradio['restore_character'] = gr.Button('Restore character', elem_classes='refresh-button', interactive=True, elem_id='restore-character')
 
                     shared.gradio['name2'] = gr.Textbox(value=shared.settings['name2'], lines=1, label='Character\'s name')
-                    shared.gradio['context'] = gr.Textbox(value=shared.settings['context'], lines=10, label='Context', elem_classes=['add_scrollbar'])
-                    shared.gradio['greeting'] = gr.Textbox(value=shared.settings['greeting'], lines=5, label='Greeting', elem_classes=['add_scrollbar'])
+                    shared.gradio['context'] = gr.Textbox(value=shared.settings['context'], lines=10, label='Context', elem_classes=['add_scrollbar'], elem_id="character-context")
+                    shared.gradio['greeting'] = gr.Textbox(value=shared.settings['greeting'], lines=5, label='Greeting', elem_classes=['add_scrollbar'], elem_id="character-greeting")
 
                 with gr.Tab("User"):
                     shared.gradio['name1'] = gr.Textbox(value=shared.settings['name1'], lines=1, label='Name')
-                    shared.gradio['user_bio'] = gr.Textbox(value=shared.settings['user_bio'], lines=10, label='Description', info='Here you can optionally write a description of yourself.', placeholder='{{user}}\'s personality: ...', elem_classes=['add_scrollbar'])
+                    shared.gradio['user_bio'] = gr.Textbox(value=shared.settings['user_bio'], lines=10, label='Description', info='Here you can optionally write a description of yourself.', placeholder='{{user}}\'s personality: ...', elem_classes=['add_scrollbar'], elem_id="user-description")
 
                 with gr.Tab('Chat history'):
                     with gr.Row():
@@ -163,6 +162,9 @@ def create_chat_settings_ui():
                 shared.gradio['character_picture'] = gr.Image(label='Character picture', type='pil', interactive=not mu)
                 shared.gradio['your_picture'] = gr.Image(label='Your picture', type='pil', value=Image.open(Path('user_data/cache/pfp_me.png')) if Path('user_data/cache/pfp_me.png').exists() else None, interactive=not mu)
 
+
+def create_chat_settings_ui():
+    mu = shared.args.multi_user
     with gr.Tab('Instruction template'):
         with gr.Row():
             with gr.Column():
@@ -178,15 +180,12 @@ def create_chat_settings_ui():
 
         with gr.Row():
             with gr.Column():
-                shared.gradio['custom_system_message'] = gr.Textbox(value=shared.settings['custom_system_message'], lines=2, label='Custom system message', info='If not empty, will be used instead of the default one.', elem_classes=['add_scrollbar'])
-                shared.gradio['instruction_template_str'] = gr.Textbox(value=shared.settings['instruction_template_str'], label='Instruction template', lines=24, info='This gets autodetected; you usually don\'t need to change it. Used in instruct and chat-instruct modes.', elem_classes=['add_scrollbar', 'monospace'])
+                shared.gradio['instruction_template_str'] = gr.Textbox(value=shared.settings['instruction_template_str'], label='Instruction template', lines=24, info='This gets autodetected; you usually don\'t need to change it. Used in instruct and chat-instruct modes.', elem_classes=['add_scrollbar', 'monospace'], elem_id='instruction-template-str')
                 with gr.Row():
-                    shared.gradio['send_instruction_to_default'] = gr.Button('Send to default', elem_classes=['small-button'])
                     shared.gradio['send_instruction_to_notebook'] = gr.Button('Send to notebook', elem_classes=['small-button'])
-                    shared.gradio['send_instruction_to_negative_prompt'] = gr.Button('Send to negative prompt', elem_classes=['small-button'])
 
             with gr.Column():
-                shared.gradio['chat_template_str'] = gr.Textbox(value=shared.settings['chat_template_str'], label='Chat template', lines=22, elem_classes=['add_scrollbar', 'monospace'])
+                shared.gradio['chat_template_str'] = gr.Textbox(value=shared.settings['chat_template_str'], label='Chat template', lines=22, elem_classes=['add_scrollbar', 'monospace'], info='Defines how the chat prompt in chat/chat-instruct modes is generated.', elem_id='chat-template-str')
 
 
 def create_event_handlers():
@@ -298,7 +297,7 @@ def create_event_handlers():
     shared.gradio['mode'].change(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         chat.handle_mode_change, gradio('interface_state'), gradio('history', 'display', 'chat_style', 'chat-instruct_command', 'unique_id'), show_progress=False).then(
-        None, gradio('mode'), None, js="(mode) => {const characterContainer = document.getElementById('character-menu').parentNode.parentNode; const isInChatTab = document.querySelector('#chat-controls').contains(characterContainer); if (isInChatTab) { characterContainer.style.display = mode === 'instruct' ? 'none' : ''; }}")
+        None, gradio('mode'), None, js="(mode) => {const characterContainer = document.getElementById('character-menu').parentNode.parentNode; const isInChatTab = document.querySelector('#chat-controls').contains(characterContainer); if (isInChatTab) { characterContainer.style.display = mode === 'instruct' ? 'none' : ''; } if (mode === 'instruct') document.querySelectorAll('.bigProfilePicture').forEach(el => el.remove());}")
 
     shared.gradio['chat_style'].change(chat.redraw_html, gradio(reload_arr), gradio('display'), show_progress=False)
 
@@ -343,29 +342,14 @@ def create_event_handlers():
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         chat.handle_your_picture_change, gradio('your_picture', 'interface_state'), gradio('display'), show_progress=False)
 
-    shared.gradio['send_instruction_to_default'].click(
-        ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
-        chat.handle_send_instruction_click, gradio('interface_state'), gradio('textbox-default'), show_progress=False).then(
-        None, None, None, js=f'() => {{{ui.switch_tabs_js}; switch_to_default()}}')
-
     shared.gradio['send_instruction_to_notebook'].click(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
-        chat.handle_send_instruction_click, gradio('interface_state'), gradio('textbox-notebook'), show_progress=False).then(
+        chat.handle_send_instruction_click, gradio('interface_state'), gradio('textbox-notebook', 'textbox-default', 'output_textbox'), show_progress=False).then(
         None, None, None, js=f'() => {{{ui.switch_tabs_js}; switch_to_notebook()}}')
 
-    shared.gradio['send_instruction_to_negative_prompt'].click(
-        ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
-        chat.handle_send_instruction_click, gradio('interface_state'), gradio('negative_prompt'), show_progress=False).then(
-        None, None, None, js=f'() => {{{ui.switch_tabs_js}; switch_to_generation_parameters()}}')
-
-    shared.gradio['send-chat-to-default'].click(
-        ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
-        chat.handle_send_chat_click, gradio('interface_state'), gradio('textbox-default'), show_progress=False).then(
-        None, None, None, js=f'() => {{{ui.switch_tabs_js}; switch_to_default()}}')
-
     shared.gradio['send-chat-to-notebook'].click(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
-        chat.handle_send_chat_click, gradio('interface_state'), gradio('textbox-notebook'), show_progress=False).then(
+        chat.handle_send_chat_click, gradio('interface_state'), gradio('textbox-notebook', 'textbox-default', 'output_textbox'), show_progress=False).then(
         None, None, None, js=f'() => {{{ui.switch_tabs_js}; switch_to_notebook()}}')
 
     shared.gradio['show_controls'].change(None, gradio('show_controls'), None, js=f'(x) => {{{ui.show_controls_js}; toggle_controls(x)}}')
diff --git a/modules/ui_default.py b/modules/ui_default.py
index 8acc4b10..44af48a3 100644
--- a/modules/ui_default.py
+++ b/modules/ui_default.py
@@ -1,3 +1,5 @@
+from pathlib import Path
+
 import gradio as gr
 
 from modules import logits, shared, ui, utils
@@ -7,6 +9,7 @@ from modules.text_generation import (
     get_token_ids,
     stop_everything_event
 )
+from modules.ui_notebook import store_notebook_state_and_debounce
 from modules.utils import gradio
 
 inputs = ('textbox-default', 'interface_state')
@@ -15,11 +18,12 @@ outputs = ('output_textbox', 'html-default')
 
 def create_ui():
     mu = shared.args.multi_user
-    with gr.Tab('Default', elem_id='default-tab'):
+    with gr.Row(visible=shared.settings['show_two_notebook_columns']) as shared.gradio['default-tab']:
         with gr.Row():
             with gr.Column():
                 with gr.Row():
-                    shared.gradio['textbox-default'] = gr.Textbox(value=load_prompt(shared.settings['prompt-default']), lines=27, label='Input', elem_classes=['textbox_default', 'add_scrollbar'])
+                    initial_text = load_prompt(shared.settings['prompt-notebook'])
+                    shared.gradio['textbox-default'] = gr.Textbox(value=initial_text, lines=27, label='Input', elem_classes=['textbox_default', 'add_scrollbar'])
                     shared.gradio['token-counter-default'] = gr.HTML(value="<span>0</span>", elem_id="default-token-counter")
 
                 with gr.Row():
@@ -28,11 +32,21 @@ def create_ui():
                     shared.gradio['Generate-default'] = gr.Button('Generate', variant='primary')
 
                 with gr.Row():
-                    shared.gradio['prompt_menu-default'] = gr.Dropdown(choices=utils.get_available_prompts(), value=shared.settings['prompt-default'], label='Prompt', elem_classes='slim-dropdown')
+                    shared.gradio['prompt_menu-default'] = gr.Dropdown(choices=utils.get_available_prompts(), value=shared.settings['prompt-notebook'], label='Prompt', elem_classes='slim-dropdown')
                     ui.create_refresh_button(shared.gradio['prompt_menu-default'], lambda: None, lambda: {'choices': utils.get_available_prompts()}, 'refresh-button', interactive=not mu)
-                    shared.gradio['save_prompt-default'] = gr.Button('💾', elem_classes='refresh-button', interactive=not mu)
+                    shared.gradio['new_prompt-default'] = gr.Button('New', elem_classes='refresh-button', interactive=not mu)
+                    shared.gradio['rename_prompt-default'] = gr.Button('Rename', elem_classes='refresh-button', interactive=not mu)
                     shared.gradio['delete_prompt-default'] = gr.Button('🗑️', elem_classes='refresh-button', interactive=not mu)
 
+                    # Rename elements (initially hidden)
+                    shared.gradio['rename_prompt_to-default'] = gr.Textbox(label="New name", elem_classes=['no-background'], visible=False)
+                    shared.gradio['rename_prompt-cancel-default'] = gr.Button('Cancel', elem_classes=['refresh-button'], visible=False)
+                    shared.gradio['rename_prompt-confirm-default'] = gr.Button('Confirm', elem_classes=['refresh-button'], variant='primary', visible=False)
+
+                    # Delete confirmation elements (initially hidden)
+                    shared.gradio['delete_prompt-cancel-default'] = gr.Button('Cancel', elem_classes=['refresh-button'], visible=False)
+                    shared.gradio['delete_prompt-confirm-default'] = gr.Button('Confirm', variant='stop', elem_classes=['refresh-button'], visible=False)
+
             with gr.Column():
                 with gr.Tab('Raw'):
                     shared.gradio['output_textbox'] = gr.Textbox(lines=27, label='Output', elem_id='textbox-default', elem_classes=['textbox_default_output', 'add_scrollbar'])
@@ -64,7 +78,7 @@ def create_event_handlers():
     shared.gradio['Generate-default'].click(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('Stop-default', 'Generate-default')).then(
-        generate_reply_wrapper, gradio(inputs), gradio(outputs), show_progress=False).then(
+        generate_reply_wrapper, gradio('textbox-default', 'interface_state'), gradio(outputs), show_progress=False).then(
         lambda state, left, right: state.update({'textbox-default': left, 'output_textbox': right}), gradio('interface_state', 'textbox-default', 'output_textbox'), None).then(
         lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('Stop-default', 'Generate-default')).then(
         None, None, None, js=f'() => {{{ui.audio_notification_js}}}')
@@ -72,7 +86,7 @@ def create_event_handlers():
     shared.gradio['textbox-default'].submit(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('Stop-default', 'Generate-default')).then(
-        generate_reply_wrapper, gradio(inputs), gradio(outputs), show_progress=False).then(
+        generate_reply_wrapper, gradio('textbox-default', 'interface_state'), gradio(outputs), show_progress=False).then(
         lambda state, left, right: state.update({'textbox-default': left, 'output_textbox': right}), gradio('interface_state', 'textbox-default', 'output_textbox'), None).then(
         lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('Stop-default', 'Generate-default')).then(
         None, None, None, js=f'() => {{{ui.audio_notification_js}}}')
@@ -80,16 +94,60 @@ def create_event_handlers():
     shared.gradio['Continue-default'].click(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('Stop-default', 'Generate-default')).then(
-        generate_reply_wrapper, [shared.gradio['output_textbox']] + gradio(inputs)[1:], gradio(outputs), show_progress=False).then(
+        generate_reply_wrapper, gradio('output_textbox', 'interface_state'), gradio(outputs), show_progress=False).then(
         lambda state, left, right: state.update({'textbox-default': left, 'output_textbox': right}), gradio('interface_state', 'textbox-default', 'output_textbox'), None).then(
         lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('Stop-default', 'Generate-default')).then(
         None, None, None, js=f'() => {{{ui.audio_notification_js}}}')
 
     shared.gradio['Stop-default'].click(stop_everything_event, None, None, queue=False)
     shared.gradio['markdown_render-default'].click(lambda x: x, gradio('output_textbox'), gradio('markdown-default'), queue=False)
-    shared.gradio['prompt_menu-default'].change(load_prompt, gradio('prompt_menu-default'), gradio('textbox-default'), show_progress=False)
-    shared.gradio['save_prompt-default'].click(handle_save_prompt, gradio('textbox-default'), gradio('save_contents', 'save_filename', 'save_root', 'file_saver'), show_progress=False)
-    shared.gradio['delete_prompt-default'].click(handle_delete_prompt, gradio('prompt_menu-default'), gradio('delete_filename', 'delete_root', 'file_deleter'), show_progress=False)
+    shared.gradio['prompt_menu-default'].change(lambda x: (load_prompt(x), ""), gradio('prompt_menu-default'), gradio('textbox-default', 'output_textbox'), show_progress=False)
+    shared.gradio['new_prompt-default'].click(handle_new_prompt, None, gradio('prompt_menu-default'), show_progress=False)
+
+    # Input change handler to save input (reusing notebook's debounced saving)
+    shared.gradio['textbox-default'].change(
+        store_notebook_state_and_debounce,
+        gradio('textbox-default', 'prompt_menu-default'),
+        None,
+        show_progress=False
+    )
+
+    shared.gradio['delete_prompt-default'].click(
+        lambda: [gr.update(visible=False), gr.update(visible=True), gr.update(visible=True)],
+        None,
+        gradio('delete_prompt-default', 'delete_prompt-cancel-default', 'delete_prompt-confirm-default'),
+        show_progress=False)
+
+    shared.gradio['delete_prompt-cancel-default'].click(
+        lambda: [gr.update(visible=True), gr.update(visible=False), gr.update(visible=False)],
+        None,
+        gradio('delete_prompt-default', 'delete_prompt-cancel-default', 'delete_prompt-confirm-default'),
+        show_progress=False)
+
+    shared.gradio['delete_prompt-confirm-default'].click(
+        handle_delete_prompt_confirm_default,
+        gradio('prompt_menu-default'),
+        gradio('prompt_menu-default', 'delete_prompt-default', 'delete_prompt-cancel-default', 'delete_prompt-confirm-default'),
+        show_progress=False)
+
+    shared.gradio['rename_prompt-default'].click(
+        handle_rename_prompt_click_default,
+        gradio('prompt_menu-default'),
+        gradio('rename_prompt_to-default', 'rename_prompt-default', 'rename_prompt-cancel-default', 'rename_prompt-confirm-default'),
+        show_progress=False)
+
+    shared.gradio['rename_prompt-cancel-default'].click(
+        lambda: [gr.update(visible=False), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False)],
+        None,
+        gradio('rename_prompt_to-default', 'rename_prompt-default', 'rename_prompt-cancel-default', 'rename_prompt-confirm-default'),
+        show_progress=False)
+
+    shared.gradio['rename_prompt-confirm-default'].click(
+        handle_rename_prompt_confirm_default,
+        gradio('rename_prompt_to-default', 'prompt_menu-default'),
+        gradio('prompt_menu-default', 'rename_prompt_to-default', 'rename_prompt-default', 'rename_prompt-cancel-default', 'rename_prompt-confirm-default'),
+        show_progress=False)
+
     shared.gradio['textbox-default'].change(lambda x: f"<span>{count_tokens(x)}</span>", gradio('textbox-default'), gradio('token-counter-default'), show_progress=False)
     shared.gradio['get_logits-default'].click(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
@@ -98,18 +156,61 @@ def create_event_handlers():
     shared.gradio['get_tokens-default'].click(get_token_ids, gradio('textbox-default'), gradio('tokens-default'), show_progress=False)
 
 
-def handle_save_prompt(text):
+def handle_new_prompt():
+    new_name = utils.current_time()
+
+    # Create the new prompt file
+    prompt_path = Path("user_data/logs/notebook") / f"{new_name}.txt"
+    prompt_path.parent.mkdir(parents=True, exist_ok=True)
+    prompt_path.write_text("In this story,", encoding='utf-8')
+
+    return gr.update(choices=utils.get_available_prompts(), value=new_name)
+
+
+def handle_delete_prompt_confirm_default(prompt_name):
+    available_prompts = utils.get_available_prompts()
+    current_index = available_prompts.index(prompt_name) if prompt_name in available_prompts else 0
+
+    (Path("user_data/logs/notebook") / f"{prompt_name}.txt").unlink(missing_ok=True)
+    available_prompts = utils.get_available_prompts()
+
+    if available_prompts:
+        new_value = available_prompts[min(current_index, len(available_prompts) - 1)]
+    else:
+        new_value = utils.current_time()
+        Path("user_data/logs/notebook").mkdir(parents=True, exist_ok=True)
+        (Path("user_data/logs/notebook") / f"{new_value}.txt").write_text("In this story,")
+        available_prompts = [new_value]
+
     return [
-        text,
-        utils.current_time() + ".txt",
-        "user_data/prompts/",
+        gr.update(choices=available_prompts, value=new_value),
+        gr.update(visible=True),
+        gr.update(visible=False),
+        gr.update(visible=False)
+    ]
+
+
+def handle_rename_prompt_click_default(current_name):
+    return [
+        gr.update(value=current_name, visible=True),
+        gr.update(visible=False),
+        gr.update(visible=True),
         gr.update(visible=True)
     ]
 
 
-def handle_delete_prompt(prompt):
+def handle_rename_prompt_confirm_default(new_name, current_name):
+    old_path = Path("user_data/logs/notebook") / f"{current_name}.txt"
+    new_path = Path("user_data/logs/notebook") / f"{new_name}.txt"
+
+    if old_path.exists() and not new_path.exists():
+        old_path.rename(new_path)
+
+    available_prompts = utils.get_available_prompts()
     return [
-        prompt + ".txt",
-        "user_data/prompts/",
-        gr.update(visible=True)
+        gr.update(choices=available_prompts, value=new_name),
+        gr.update(visible=False),
+        gr.update(visible=True),
+        gr.update(visible=False),
+        gr.update(visible=False)
     ]
diff --git a/modules/ui_model_menu.py b/modules/ui_model_menu.py
index 9e982f0e..6b106203 100644
--- a/modules/ui_model_menu.py
+++ b/modules/ui_model_menu.py
@@ -135,7 +135,7 @@ def create_event_handlers():
     # with the model defaults (if any), and then the model is loaded
     shared.gradio['model_menu'].change(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
-        handle_load_model_event_initial, gradio('model_menu', 'interface_state'), gradio(ui.list_interface_input_elements()) + gradio('interface_state'), show_progress=False).then(
+        handle_load_model_event_initial, gradio('model_menu', 'interface_state'), gradio(ui.list_interface_input_elements()) + gradio('interface_state') + gradio('vram_info'), show_progress=False).then(
         partial(load_model_wrapper, autoload=False), gradio('model_menu', 'loader'), gradio('model_status'), show_progress=True).success(
         handle_load_model_event_final, gradio('truncation_length', 'loader', 'interface_state'), gradio('truncation_length', 'filter_by_loader'), show_progress=False)
 
@@ -174,7 +174,12 @@ def create_event_handlers():
 
 
 def load_model_wrapper(selected_model, loader, autoload=False):
-    settings = get_model_metadata(selected_model)
+    try:
+        settings = get_model_metadata(selected_model)
+    except FileNotFoundError:
+        exc = traceback.format_exc()
+        yield exc.replace('\n', '\n\n')
+        return
 
     if not autoload:
         yield "### {}\n\n- Settings updated: Click \"Load\" to load the model\n- Max sequence length: {}".format(selected_model, settings['truncation_length_info'])
@@ -374,7 +379,8 @@ def handle_load_model_event_initial(model, state):
     output = ui.apply_interface_values(state)
     update_model_parameters(state)  # This updates the command-line flags
 
-    return output + [state]
+    vram_info = state.get('vram_info', "<div id=\"vram-info\"'>Estimated VRAM to load the model:</div>")
+    return output + [state] + [vram_info]
 
 
 def handle_load_model_event_final(truncation_length, loader, state):
diff --git a/modules/ui_notebook.py b/modules/ui_notebook.py
index 3f79a93c..939d81f7 100644
--- a/modules/ui_notebook.py
+++ b/modules/ui_notebook.py
@@ -1,3 +1,7 @@
+import threading
+import time
+from pathlib import Path
+
 import gradio as gr
 
 from modules import logits, shared, ui, utils
@@ -7,22 +11,27 @@ from modules.text_generation import (
     get_token_ids,
     stop_everything_event
 )
-from modules.ui_default import handle_delete_prompt, handle_save_prompt
 from modules.utils import gradio
 
+_notebook_file_lock = threading.Lock()
+_notebook_auto_save_timer = None
+_last_notebook_text = None
+_last_notebook_prompt = None
+
 inputs = ('textbox-notebook', 'interface_state')
 outputs = ('textbox-notebook', 'html-notebook')
 
 
 def create_ui():
     mu = shared.args.multi_user
-    with gr.Tab('Notebook', elem_id='notebook-tab'):
+    with gr.Row(visible=not shared.settings['show_two_notebook_columns']) as shared.gradio['notebook-tab']:
         shared.gradio['last_input-notebook'] = gr.State('')
         with gr.Row():
             with gr.Column(scale=4):
                 with gr.Tab('Raw'):
                     with gr.Row():
-                        shared.gradio['textbox-notebook'] = gr.Textbox(value=load_prompt(shared.settings['prompt-notebook']), lines=27, elem_id='textbox-notebook', elem_classes=['textbox', 'add_scrollbar'])
+                        initial_text = load_prompt(shared.settings['prompt-notebook'])
+                        shared.gradio['textbox-notebook'] = gr.Textbox(label="", value=initial_text, lines=27, elem_id='textbox-notebook', elem_classes=['textbox', 'add_scrollbar'])
                         shared.gradio['token-counter-notebook'] = gr.HTML(value="<span>0</span>", elem_id="notebook-token-counter")
 
                 with gr.Tab('Markdown'):
@@ -57,9 +66,19 @@ def create_ui():
                 gr.HTML('<div style="padding-bottom: 13px"></div>')
                 with gr.Row():
                     shared.gradio['prompt_menu-notebook'] = gr.Dropdown(choices=utils.get_available_prompts(), value=shared.settings['prompt-notebook'], label='Prompt', elem_classes='slim-dropdown')
-                    ui.create_refresh_button(shared.gradio['prompt_menu-notebook'], lambda: None, lambda: {'choices': utils.get_available_prompts()}, ['refresh-button', 'refresh-button-small'], interactive=not mu)
-                    shared.gradio['save_prompt-notebook'] = gr.Button('💾', elem_classes=['refresh-button', 'refresh-button-small'], interactive=not mu)
-                    shared.gradio['delete_prompt-notebook'] = gr.Button('🗑️', elem_classes=['refresh-button', 'refresh-button-small'], interactive=not mu)
+
+                with gr.Row():
+                    ui.create_refresh_button(shared.gradio['prompt_menu-notebook'], lambda: None, lambda: {'choices': utils.get_available_prompts()}, ['refresh-button'], interactive=not mu)
+                    shared.gradio['new_prompt-notebook'] = gr.Button('New', elem_classes=['refresh-button'], interactive=not mu)
+                    shared.gradio['rename_prompt-notebook'] = gr.Button('Rename', elem_classes=['refresh-button'], interactive=not mu)
+                    shared.gradio['delete_prompt-notebook'] = gr.Button('🗑️', elem_classes=['refresh-button'], interactive=not mu)
+                    shared.gradio['delete_prompt-confirm-notebook'] = gr.Button('Confirm', variant='stop', elem_classes=['refresh-button'], visible=False)
+                    shared.gradio['delete_prompt-cancel-notebook'] = gr.Button('Cancel', elem_classes=['refresh-button'], visible=False)
+
+                with gr.Row(visible=False) as shared.gradio['rename-row-notebook']:
+                    shared.gradio['rename_prompt_to-notebook'] = gr.Textbox(label="New name", elem_classes=['no-background'])
+                    shared.gradio['rename_prompt-cancel-notebook'] = gr.Button('Cancel', elem_classes=['refresh-button'])
+                    shared.gradio['rename_prompt-confirm-notebook'] = gr.Button('Confirm', elem_classes=['refresh-button'], variant='primary')
 
 
 def create_event_handlers():
@@ -67,7 +86,7 @@ def create_event_handlers():
         lambda x: x, gradio('textbox-notebook'), gradio('last_input-notebook')).then(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('Stop-notebook', 'Generate-notebook')).then(
-        generate_reply_wrapper, gradio(inputs), gradio(outputs), show_progress=False).then(
+        generate_and_save_wrapper_notebook, gradio('textbox-notebook', 'interface_state', 'prompt_menu-notebook'), gradio(outputs), show_progress=False).then(
         lambda state, text: state.update({'textbox-notebook': text}), gradio('interface_state', 'textbox-notebook'), None).then(
         lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('Stop-notebook', 'Generate-notebook')).then(
         None, None, None, js=f'() => {{{ui.audio_notification_js}}}')
@@ -76,7 +95,7 @@ def create_event_handlers():
         lambda x: x, gradio('textbox-notebook'), gradio('last_input-notebook')).then(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('Stop-notebook', 'Generate-notebook')).then(
-        generate_reply_wrapper, gradio(inputs), gradio(outputs), show_progress=False).then(
+        generate_and_save_wrapper_notebook, gradio('textbox-notebook', 'interface_state', 'prompt_menu-notebook'), gradio(outputs), show_progress=False).then(
         lambda state, text: state.update({'textbox-notebook': text}), gradio('interface_state', 'textbox-notebook'), None).then(
         lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('Stop-notebook', 'Generate-notebook')).then(
         None, None, None, js=f'() => {{{ui.audio_notification_js}}}')
@@ -85,7 +104,7 @@ def create_event_handlers():
         lambda x: x, gradio('last_input-notebook'), gradio('textbox-notebook'), show_progress=False).then(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         lambda: [gr.update(visible=True), gr.update(visible=False)], None, gradio('Stop-notebook', 'Generate-notebook')).then(
-        generate_reply_wrapper, gradio(inputs), gradio(outputs), show_progress=False).then(
+        generate_and_save_wrapper_notebook, gradio('textbox-notebook', 'interface_state', 'prompt_menu-notebook'), gradio(outputs), show_progress=False).then(
         lambda state, text: state.update({'textbox-notebook': text}), gradio('interface_state', 'textbox-notebook'), None).then(
         lambda: [gr.update(visible=False), gr.update(visible=True)], None, gradio('Stop-notebook', 'Generate-notebook')).then(
         None, None, None, js=f'() => {{{ui.audio_notification_js}}}')
@@ -97,11 +116,173 @@ def create_event_handlers():
     shared.gradio['markdown_render-notebook'].click(lambda x: x, gradio('textbox-notebook'), gradio('markdown-notebook'), queue=False)
     shared.gradio['Stop-notebook'].click(stop_everything_event, None, None, queue=False)
     shared.gradio['prompt_menu-notebook'].change(load_prompt, gradio('prompt_menu-notebook'), gradio('textbox-notebook'), show_progress=False)
-    shared.gradio['save_prompt-notebook'].click(handle_save_prompt, gradio('textbox-notebook'), gradio('save_contents', 'save_filename', 'save_root', 'file_saver'), show_progress=False)
-    shared.gradio['delete_prompt-notebook'].click(handle_delete_prompt, gradio('prompt_menu-notebook'), gradio('delete_filename', 'delete_root', 'file_deleter'), show_progress=False)
+    shared.gradio['new_prompt-notebook'].click(handle_new_prompt, None, gradio('prompt_menu-notebook'), show_progress=False)
+
+    shared.gradio['delete_prompt-notebook'].click(
+        lambda: [gr.update(visible=False), gr.update(visible=True), gr.update(visible=True)],
+        None,
+        gradio('delete_prompt-notebook', 'delete_prompt-cancel-notebook', 'delete_prompt-confirm-notebook'),
+        show_progress=False)
+
+    shared.gradio['delete_prompt-cancel-notebook'].click(
+        lambda: [gr.update(visible=True), gr.update(visible=False), gr.update(visible=False)],
+        None,
+        gradio('delete_prompt-notebook', 'delete_prompt-cancel-notebook', 'delete_prompt-confirm-notebook'),
+        show_progress=False)
+
+    shared.gradio['delete_prompt-confirm-notebook'].click(
+        handle_delete_prompt_confirm_notebook,
+        gradio('prompt_menu-notebook'),
+        gradio('prompt_menu-notebook', 'delete_prompt-notebook', 'delete_prompt-cancel-notebook', 'delete_prompt-confirm-notebook'),
+        show_progress=False)
+
+    shared.gradio['rename_prompt-notebook'].click(
+        handle_rename_prompt_click_notebook,
+        gradio('prompt_menu-notebook'),
+        gradio('rename_prompt_to-notebook', 'rename_prompt-notebook', 'rename-row-notebook'),
+        show_progress=False)
+
+    shared.gradio['rename_prompt-cancel-notebook'].click(
+        lambda: [gr.update(visible=True), gr.update(visible=False)],
+        None,
+        gradio('rename_prompt-notebook', 'rename-row-notebook'),
+        show_progress=False)
+
+    shared.gradio['rename_prompt-confirm-notebook'].click(
+        handle_rename_prompt_confirm_notebook,
+        gradio('rename_prompt_to-notebook', 'prompt_menu-notebook'),
+        gradio('prompt_menu-notebook', 'rename_prompt-notebook', 'rename-row-notebook'),
+        show_progress=False)
+
     shared.gradio['textbox-notebook'].input(lambda x: f"<span>{count_tokens(x)}</span>", gradio('textbox-notebook'), gradio('token-counter-notebook'), show_progress=False)
+    shared.gradio['textbox-notebook'].change(
+        store_notebook_state_and_debounce,
+        gradio('textbox-notebook', 'prompt_menu-notebook'),
+        None,
+        show_progress=False
+    )
+
     shared.gradio['get_logits-notebook'].click(
         ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
         logits.get_next_logits, gradio('textbox-notebook', 'interface_state', 'use_samplers-notebook', 'logits-notebook'), gradio('logits-notebook', 'logits-notebook-previous'), show_progress=False)
 
     shared.gradio['get_tokens-notebook'].click(get_token_ids, gradio('textbox-notebook'), gradio('tokens-notebook'), show_progress=False)
+
+
+def generate_and_save_wrapper_notebook(textbox_content, interface_state, prompt_name):
+    """Generate reply and automatically save the result for notebook mode with periodic saves"""
+    last_save_time = time.monotonic()
+    save_interval = 8
+    output = textbox_content
+
+    # Initial autosave
+    safe_autosave_prompt(output, prompt_name)
+
+    for i, (output, html_output) in enumerate(generate_reply_wrapper(textbox_content, interface_state)):
+        yield output, html_output
+
+        current_time = time.monotonic()
+        # Save on first iteration or if save_interval seconds have passed
+        if i == 0 or (current_time - last_save_time) >= save_interval:
+            safe_autosave_prompt(output, prompt_name)
+            last_save_time = current_time
+
+    # Final autosave
+    safe_autosave_prompt(output, prompt_name)
+
+
+def handle_new_prompt():
+    new_name = utils.current_time()
+
+    # Create the new prompt file
+    prompt_path = Path("user_data/logs/notebook") / f"{new_name}.txt"
+    prompt_path.parent.mkdir(parents=True, exist_ok=True)
+    prompt_path.write_text("In this story,", encoding='utf-8')
+
+    return gr.update(choices=utils.get_available_prompts(), value=new_name)
+
+
+def handle_delete_prompt_confirm_notebook(prompt_name):
+    available_prompts = utils.get_available_prompts()
+    current_index = available_prompts.index(prompt_name) if prompt_name in available_prompts else 0
+
+    (Path("user_data/logs/notebook") / f"{prompt_name}.txt").unlink(missing_ok=True)
+    available_prompts = utils.get_available_prompts()
+
+    if available_prompts:
+        new_value = available_prompts[min(current_index, len(available_prompts) - 1)]
+    else:
+        new_value = utils.current_time()
+        Path("user_data/logs/notebook").mkdir(parents=True, exist_ok=True)
+        (Path("user_data/logs/notebook") / f"{new_value}.txt").write_text("In this story,")
+        available_prompts = [new_value]
+
+    return [
+        gr.update(choices=available_prompts, value=new_value),
+        gr.update(visible=True),
+        gr.update(visible=False),
+        gr.update(visible=False)
+    ]
+
+
+def handle_rename_prompt_click_notebook(current_name):
+    return [
+        gr.update(value=current_name),
+        gr.update(visible=False),
+        gr.update(visible=True)
+    ]
+
+
+def handle_rename_prompt_confirm_notebook(new_name, current_name):
+    old_path = Path("user_data/logs/notebook") / f"{current_name}.txt"
+    new_path = Path("user_data/logs/notebook") / f"{new_name}.txt"
+
+    if old_path.exists() and not new_path.exists():
+        old_path.rename(new_path)
+
+    available_prompts = utils.get_available_prompts()
+    return [
+        gr.update(choices=available_prompts, value=new_name),
+        gr.update(visible=True),
+        gr.update(visible=False)
+    ]
+
+
+def autosave_prompt(text, prompt_name):
+    """Automatically save the text to the selected prompt file"""
+    if prompt_name and text.strip():
+        prompt_path = Path("user_data/logs/notebook") / f"{prompt_name}.txt"
+        prompt_path.parent.mkdir(parents=True, exist_ok=True)
+        prompt_path.write_text(text, encoding='utf-8')
+
+
+def safe_autosave_prompt(content, prompt_name):
+    """Thread-safe wrapper for autosave_prompt to prevent file corruption"""
+    with _notebook_file_lock:
+        autosave_prompt(content, prompt_name)
+
+
+def store_notebook_state_and_debounce(text, prompt_name):
+    """Store current notebook state and trigger debounced save"""
+    global _notebook_auto_save_timer, _last_notebook_text, _last_notebook_prompt
+
+    if shared.args.multi_user:
+        return
+
+    _last_notebook_text = text
+    _last_notebook_prompt = prompt_name
+
+    if _notebook_auto_save_timer is not None:
+        _notebook_auto_save_timer.cancel()
+
+    _notebook_auto_save_timer = threading.Timer(1.0, _perform_notebook_debounced_save)
+    _notebook_auto_save_timer.start()
+
+
+def _perform_notebook_debounced_save():
+    """Actually perform the notebook save using the stored state"""
+    try:
+        if _last_notebook_text is not None and _last_notebook_prompt is not None:
+            safe_autosave_prompt(_last_notebook_text, _last_notebook_prompt)
+    except Exception as e:
+        print(f"Notebook auto-save failed: {e}")
diff --git a/modules/ui_parameters.py b/modules/ui_parameters.py
index e2b10554..e42e4c0c 100644
--- a/modules/ui_parameters.py
+++ b/modules/ui_parameters.py
@@ -93,7 +93,7 @@ def create_ui():
                         with gr.Column():
                             shared.gradio['truncation_length'] = gr.Number(precision=0, step=256, value=get_truncation_length(), label='Truncate the prompt up to this length', info='The leftmost tokens are removed if the prompt exceeds this length.')
                             shared.gradio['seed'] = gr.Number(value=shared.settings['seed'], label='Seed (-1 for random)')
-
+                            shared.gradio['custom_system_message'] = gr.Textbox(value=shared.settings['custom_system_message'], lines=2, label='Custom system message', info='If not empty, will be used instead of the default one.', elem_classes=['add_scrollbar'])
                             shared.gradio['custom_stopping_strings'] = gr.Textbox(lines=2, value=shared.settings["custom_stopping_strings"] or None, label='Custom stopping strings', info='Written between "" and separated by commas.', placeholder='"\\n", "\\nYou:"')
                             shared.gradio['custom_token_bans'] = gr.Textbox(value=shared.settings['custom_token_bans'] or None, label='Token bans', info='Token IDs to ban, separated by commas. The IDs can be found in the Default or Notebook tab.')
                             shared.gradio['negative_prompt'] = gr.Textbox(value=shared.settings['negative_prompt'], label='Negative prompt', info='For CFG. Only used when guidance_scale is different than 1.', lines=3, elem_classes=['add_scrollbar'])
diff --git a/modules/ui_session.py b/modules/ui_session.py
index 0673828e..a69e155b 100644
--- a/modules/ui_session.py
+++ b/modules/ui_session.py
@@ -11,7 +11,9 @@ def create_ui():
             with gr.Column():
                 gr.Markdown("## Settings")
                 shared.gradio['toggle_dark_mode'] = gr.Button('Toggle light/dark theme 💡', elem_classes='refresh-button')
+                shared.gradio['show_two_notebook_columns'] = gr.Checkbox(label='Show two columns in the Notebook tab', value=shared.settings['show_two_notebook_columns'])
                 shared.gradio['paste_to_attachment'] = gr.Checkbox(label='Turn long pasted text into attachments in the Chat tab', value=shared.settings['paste_to_attachment'], elem_id='paste_to_attachment')
+                shared.gradio['include_past_attachments'] = gr.Checkbox(label='Include attachments/search results from previous messages in the chat prompt', value=shared.settings['include_past_attachments'])
 
             with gr.Column():
                 gr.Markdown("## Extensions & flags")
@@ -33,6 +35,12 @@ def create_ui():
             lambda x: 'dark' if x == 'light' else 'light', gradio('theme_state'), gradio('theme_state')).then(
             None, None, None, js=f'() => {{{ui.dark_theme_js}; toggleDarkMode(); localStorage.setItem("theme", document.body.classList.contains("dark") ? "dark" : "light")}}')
 
+        shared.gradio['show_two_notebook_columns'].change(
+            handle_default_to_notebook_change,
+            gradio('show_two_notebook_columns', 'textbox-default', 'output_textbox', 'prompt_menu-default', 'textbox-notebook', 'prompt_menu-notebook'),
+            gradio('default-tab', 'notebook-tab', 'textbox-default', 'output_textbox', 'prompt_menu-default', 'textbox-notebook', 'prompt_menu-notebook')
+        )
+
         # Reset interface event
         shared.gradio['reset_interface'].click(
             set_interface_arguments, gradio('extensions_menu', 'bool_menu'), None).then(
@@ -49,6 +57,31 @@ def handle_save_settings(state, preset, extensions, show_controls, theme):
     ]
 
 
+def handle_default_to_notebook_change(show_two_columns, default_input, default_output, default_prompt, notebook_input, notebook_prompt):
+    if show_two_columns:
+        # Notebook to default
+        return [
+            gr.update(visible=True),
+            gr.update(visible=False),
+            notebook_input,
+            "",
+            gr.update(value=notebook_prompt, choices=utils.get_available_prompts()),
+            gr.update(),
+            gr.update(),
+        ]
+    else:
+        # Default to notebook
+        return [
+            gr.update(visible=False),
+            gr.update(visible=True),
+            gr.update(),
+            gr.update(),
+            gr.update(),
+            default_input,
+            gr.update(value=default_prompt, choices=utils.get_available_prompts())
+        ]
+
+
 def set_interface_arguments(extensions, bool_active):
     shared.args.extensions = extensions
 
diff --git a/modules/utils.py b/modules/utils.py
index 21873541..c285d401 100644
--- a/modules/utils.py
+++ b/modules/utils.py
@@ -53,7 +53,7 @@ def delete_file(fname):
 
 
 def current_time():
-    return f"{datetime.now().strftime('%Y-%m-%d-%H%M%S')}"
+    return f"{datetime.now().strftime('%Y-%m-%d_%Hh%Mm%Ss')}"
 
 
 def atoi(text):
@@ -159,10 +159,12 @@ def get_available_presets():
 
 
 def get_available_prompts():
-    prompt_files = list(Path('user_data/prompts').glob('*.txt'))
+    notebook_dir = Path('user_data/logs/notebook')
+    notebook_dir.mkdir(parents=True, exist_ok=True)
+
+    prompt_files = list(notebook_dir.glob('*.txt'))
     sorted_files = sorted(prompt_files, key=lambda x: x.stat().st_mtime, reverse=True)
     prompts = [file.stem for file in sorted_files]
-    prompts.append('None')
     return prompts
 
 
diff --git a/modules/web_search.py b/modules/web_search.py
index ffd7e483..401a42bb 100644
--- a/modules/web_search.py
+++ b/modules/web_search.py
@@ -4,6 +4,7 @@ from datetime import datetime
 
 import requests
 
+from modules import shared
 from modules.logging_colors import logger
 
 
@@ -28,6 +29,8 @@ def download_web_page(url, timeout=10):
         # Initialize the HTML to Markdown converter
         h = html2text.HTML2Text()
         h.body_width = 0
+        h.ignore_images = True
+        h.ignore_links = True
 
         # Convert the HTML to Markdown
         markdown_text = h.handle(response.text)
@@ -90,6 +93,22 @@ def perform_web_search(query, num_pages=3, max_workers=5):
         return []
 
 
+def truncate_content_by_tokens(content, max_tokens=8192):
+    """Truncate content to fit within token limit using binary search"""
+    if len(shared.tokenizer.encode(content)) <= max_tokens:
+        return content
+
+    left, right = 0, len(content)
+    while left < right:
+        mid = (left + right + 1) // 2
+        if len(shared.tokenizer.encode(content[:mid])) <= max_tokens:
+            left = mid
+        else:
+            right = mid - 1
+
+    return content[:left]
+
+
 def add_web_search_attachments(history, row_idx, user_message, search_query, state):
     """Perform web search and add results as attachments"""
     if not search_query:
@@ -126,7 +145,7 @@ def add_web_search_attachments(history, row_idx, user_message, search_query, sta
                 "name": result['title'],
                 "type": "text/html",
                 "url": result['url'],
-                "content": result['content']
+                "content": truncate_content_by_tokens(result['content'])
             }
             history['metadata'][key]["attachments"].append(attachment)
 
diff --git a/requirements/full/requirements.txt b/requirements/full/requirements.txt
index a71e5240..19e5e0fe 100644
--- a/requirements/full/requirements.txt
+++ b/requirements/full/requirements.txt
@@ -34,10 +34,10 @@ sse-starlette==1.6.5
 tiktoken
 
 # CUDA wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
-https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
-https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/oobabooga/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/oobabooga/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
diff --git a/requirements/full/requirements_amd.txt b/requirements/full/requirements_amd.txt
index db1ead1a..ebef87a6 100644
--- a/requirements/full/requirements_amd.txt
+++ b/requirements/full/requirements_amd.txt
@@ -33,7 +33,7 @@ sse-starlette==1.6.5
 tiktoken
 
 # AMD wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"
diff --git a/requirements/full/requirements_amd_noavx2.txt b/requirements/full/requirements_amd_noavx2.txt
index a08aa392..f1fccc93 100644
--- a/requirements/full/requirements_amd_noavx2.txt
+++ b/requirements/full/requirements_amd_noavx2.txt
@@ -33,7 +33,7 @@ sse-starlette==1.6.5
 tiktoken
 
 # AMD wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"
diff --git a/requirements/full/requirements_apple_intel.txt b/requirements/full/requirements_apple_intel.txt
index fa217c3e..734f22c7 100644
--- a/requirements/full/requirements_apple_intel.txt
+++ b/requirements/full/requirements_apple_intel.txt
@@ -33,7 +33,7 @@ sse-starlette==1.6.5
 tiktoken
 
 # Mac wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
-https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3-py3-none-any.whl
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
+https://github.com/oobabooga/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4-py3-none-any.whl
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl
diff --git a/requirements/full/requirements_apple_silicon.txt b/requirements/full/requirements_apple_silicon.txt
index 52581f1a..f837aade 100644
--- a/requirements/full/requirements_apple_silicon.txt
+++ b/requirements/full/requirements_apple_silicon.txt
@@ -33,8 +33,8 @@ sse-starlette==1.6.5
 tiktoken
 
 # Mac wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11"
-https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3-py3-none-any.whl
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11"
+https://github.com/oobabooga/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4-py3-none-any.whl
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl
diff --git a/requirements/full/requirements_cpu_only.txt b/requirements/full/requirements_cpu_only.txt
index b72f22aa..9ec8a720 100644
--- a/requirements/full/requirements_cpu_only.txt
+++ b/requirements/full/requirements_cpu_only.txt
@@ -33,5 +33,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # llama.cpp (CPU only, AVX2)
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
diff --git a/requirements/full/requirements_cpu_only_noavx2.txt b/requirements/full/requirements_cpu_only_noavx2.txt
index e8de6057..3a3fcde9 100644
--- a/requirements/full/requirements_cpu_only_noavx2.txt
+++ b/requirements/full/requirements_cpu_only_noavx2.txt
@@ -33,5 +33,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # llama.cpp (CPU only, no AVX2)
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
diff --git a/requirements/full/requirements_cuda128.txt b/requirements/full/requirements_cuda128.txt
index 7851041f..84ffa327 100644
--- a/requirements/full/requirements_cuda128.txt
+++ b/requirements/full/requirements_cuda128.txt
@@ -34,10 +34,10 @@ sse-starlette==1.6.5
 tiktoken
 
 # CUDA wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
-https://github.com/turboderp-org/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
-https://github.com/turboderp-org/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu128.torch2.7.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/turboderp-org/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/turboderp-org/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4+cu128.torch2.7.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu128.torch2.7.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
diff --git a/requirements/full/requirements_cuda128_noavx2.txt b/requirements/full/requirements_cuda128_noavx2.txt
index c8015166..da995438 100644
--- a/requirements/full/requirements_cuda128_noavx2.txt
+++ b/requirements/full/requirements_cuda128_noavx2.txt
@@ -34,10 +34,10 @@ sse-starlette==1.6.5
 tiktoken
 
 # CUDA wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
-https://github.com/turboderp-org/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
-https://github.com/turboderp-org/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu128.torch2.7.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/turboderp-org/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/turboderp-org/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4+cu128.torch2.7.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu128.torch2.7.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu128.torch2.7.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
diff --git a/requirements/full/requirements_noavx2.txt b/requirements/full/requirements_noavx2.txt
index 5e81ce1f..e68e8187 100644
--- a/requirements/full/requirements_noavx2.txt
+++ b/requirements/full/requirements_noavx2.txt
@@ -34,10 +34,10 @@ sse-starlette==1.6.5
 tiktoken
 
 # CUDA wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
-https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
-https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
+https://github.com/oobabooga/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
+https://github.com/oobabooga/exllamav3/releases/download/v0.0.4/exllamav3-0.0.4+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
 https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
diff --git a/requirements/portable/requirements.txt b/requirements/portable/requirements.txt
index 4ddcf43f..f596675c 100644
--- a/requirements/portable/requirements.txt
+++ b/requirements/portable/requirements.txt
@@ -19,5 +19,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # CUDA wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
diff --git a/requirements/portable/requirements_apple_intel.txt b/requirements/portable/requirements_apple_intel.txt
index 38a21618..e472e428 100644
--- a/requirements/portable/requirements_apple_intel.txt
+++ b/requirements/portable/requirements_apple_intel.txt
@@ -19,5 +19,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # Mac wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
diff --git a/requirements/portable/requirements_apple_silicon.txt b/requirements/portable/requirements_apple_silicon.txt
index 0b70c800..b60eccf5 100644
--- a/requirements/portable/requirements_apple_silicon.txt
+++ b/requirements/portable/requirements_apple_silicon.txt
@@ -19,6 +19,6 @@ sse-starlette==1.6.5
 tiktoken
 
 # Mac wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0"
diff --git a/requirements/portable/requirements_cpu_only.txt b/requirements/portable/requirements_cpu_only.txt
index 510a20f4..c6586848 100644
--- a/requirements/portable/requirements_cpu_only.txt
+++ b/requirements/portable/requirements_cpu_only.txt
@@ -19,5 +19,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # llama.cpp (CPU only, AVX2)
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"
diff --git a/requirements/portable/requirements_cpu_only_noavx2.txt b/requirements/portable/requirements_cpu_only_noavx2.txt
index e6d9f0c5..d0f113a7 100644
--- a/requirements/portable/requirements_cpu_only_noavx2.txt
+++ b/requirements/portable/requirements_cpu_only_noavx2.txt
@@ -19,5 +19,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # llama.cpp (CPU only, no AVX2)
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"
diff --git a/requirements/portable/requirements_noavx2.txt b/requirements/portable/requirements_noavx2.txt
index 48f92e0a..df1c5762 100644
--- a/requirements/portable/requirements_noavx2.txt
+++ b/requirements/portable/requirements_noavx2.txt
@@ -19,5 +19,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # CUDA wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
diff --git a/requirements/portable/requirements_vulkan.txt b/requirements/portable/requirements_vulkan.txt
index 9f93424f..2da3a81a 100644
--- a/requirements/portable/requirements_vulkan.txt
+++ b/requirements/portable/requirements_vulkan.txt
@@ -19,5 +19,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # CUDA wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
diff --git a/requirements/portable/requirements_vulkan_noavx2.txt b/requirements/portable/requirements_vulkan_noavx2.txt
index 9070b9a6..f53432d8 100644
--- a/requirements/portable/requirements_vulkan_noavx2.txt
+++ b/requirements/portable/requirements_vulkan_noavx2.txt
@@ -19,5 +19,5 @@ sse-starlette==1.6.5
 tiktoken
 
 # CUDA wheels
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
-https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.18.0/llama_cpp_binaries-0.18.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
+https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.20.0/llama_cpp_binaries-0.20.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
diff --git a/server.py b/server.py
index cfb21a6e..7ce3c208 100644
--- a/server.py
+++ b/server.py
@@ -33,7 +33,6 @@ import matplotlib
 
 matplotlib.use('Agg')  # This fixes LaTeX rendering on some systems
 
-import json
 import os
 import signal
 import sys
@@ -144,12 +143,16 @@ def create_interface():
         # Temporary clipboard for saving files
         shared.gradio['temporary_text'] = gr.Textbox(visible=False)
 
-        # Text Generation tab
+        # Chat tab
         ui_chat.create_ui()
-        ui_default.create_ui()
-        ui_notebook.create_ui()
+
+        # Notebook tab
+        with gr.Tab("Notebook", elem_id='notebook-parent-tab'):
+            ui_default.create_ui()
+            ui_notebook.create_ui()
 
         ui_parameters.create_ui()  # Parameters tab
+        ui_chat.create_character_settings_ui()  # Character tab
         ui_model_menu.create_ui()  # Model tab
         if not shared.args.portable:
             training.create_ui()  # Training tab
diff --git a/user_data/prompts/Alpaca-with-Input.txt b/user_data/prompts/Alpaca-with-Input.txt
deleted file mode 100644
index 56df0e28..00000000
--- a/user_data/prompts/Alpaca-with-Input.txt
+++ /dev/null
@@ -1,10 +0,0 @@
-Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
-
-### Instruction:
-Instruction
-
-### Input:
-Input
-
-### Response:
-
diff --git a/user_data/prompts/QA.txt b/user_data/prompts/QA.txt
deleted file mode 100644
index 32b0e235..00000000
--- a/user_data/prompts/QA.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-Common sense questions and answers
-
-Question: 
-Factual answer: