Compare commits

...

2564 commits
v1.7 ... main

Author SHA1 Message Date
oobabooga 256431f258 Security: server-side file save roots, image URL SSRF protection, extension allowlist 2026-03-17 22:31:20 -07:00
oobabooga 88a318894c
Merge pull request #7425 from oobabooga/dev
Merge dev branch
2026-03-16 12:51:33 -03:00
oobabooga 44810751de Update llama.cpp 2026-03-16 06:21:14 -07:00
oobabooga 6c05a964a7 docs: Mention supported tool-calling models 2026-03-16 06:00:16 -07:00
oobabooga 737ded6959 Web search: Fix SSRF validation to block all non-global IPs 2026-03-16 05:37:46 -07:00
oobabooga 50685c93f2 Update README 2026-03-16 05:29:27 -07:00
oobabooga 9d9f5d9860 Update README 2026-03-15 20:27:44 -07:00
oobabooga 5cfe9fe295 Update README 2026-03-15 20:12:22 -07:00
oobabooga b76a289e04 API: Respect --listen-host for the OpenAI API server
Closes #7429
2026-03-15 18:04:34 -07:00
oobabooga c0de1d176c UI: Add an incognito chat option 2026-03-15 17:57:31 -07:00
oobabooga 4f80b20859 UI: Follow-up to beab346f (fix scroll deadlock on chat-parent) 2026-03-15 16:38:54 -07:00
oobabooga f8ff7cf99e Update the custom gradio wheels 2026-03-15 14:12:59 -07:00
oobabooga 92d376e420 web_search: Return all results and improve URL extraction 2026-03-15 13:14:53 -07:00
oobabooga f6a749a151 API: Fix /v1/models to only list the currently loaded model 2026-03-15 10:17:31 -07:00
oobabooga 1a2b840938 UI: Fix scroll jump when toggling thinking blocks during streaming 2026-03-15 09:52:31 -07:00
oobabooga bfea49b197 Move top_p and top_k higher up in the UI and CLI help 2026-03-15 09:34:17 -07:00
oobabooga 80d0c03bab llama.cpp: Change the default --fit-target from 1024 to 512 2026-03-15 09:29:25 -07:00
oobabooga 9119ce0680 llama.cpp: Use --fit-ctx 8192 when --fit on is used
This sets the minimum acceptable context length, which by default is 4096.
2026-03-15 09:24:14 -07:00
oobabooga 5763cab3c4 Fix a crash loading the MiniMax-M2.5 jinja template 2026-03-15 07:13:26 -07:00
oobabooga f0c16813ef Remove the rope scaling parameters
Now models have 131k+ context length. The parameters can still be
passed to llama.cpp through --extra-flags.
2026-03-14 19:43:25 -07:00
oobabooga 2d3a3794c9 Add a Top-P preset, make it the new default, clean up the built-in presets 2026-03-14 19:22:12 -07:00
oobabooga 9955e54a1f UI: Fix autoscroll not engaging when regenerating short chats 2026-03-14 18:51:12 -07:00
oobabooga d1aba08561 UI: Set chat widths to 724px 2026-03-14 18:35:44 -07:00
oobabooga c126530061 UI: Minor color change 2026-03-14 18:22:41 -07:00
oobabooga b9bdbd638e Fix after 4ae2bd86e2 2026-03-14 18:18:33 -07:00
oobabooga 9eacd4a207 UI: Minor morphdom optimizations 2026-03-14 16:07:16 -07:00
oobabooga e11425d5f8 Fix relative redirect handling in web page fetcher 2026-03-14 15:46:21 -07:00
oobabooga 4ae2bd86e2 Change the default ctx-size to 0 (auto) for llama.cpp 2026-03-14 15:30:01 -07:00
oobabooga 9f657d3976 UI: Fix a minor glitch 2026-03-14 14:19:12 -07:00
oobabooga c09a367c64 UI: Fix dark theme using light theme syntax highlighting 2026-03-14 14:08:03 -07:00
oobabooga beab346f48 UI: Fix a minor glitch 2026-03-14 12:45:37 -07:00
oobabooga 573617157a Optimize tool call detection
Avoids templates that don't contain a given necessary keyword
2026-03-14 12:09:41 -07:00
oobabooga d0a4993cf4 UI: Increase ctx-size slider maximum to 1M and step to 1024 2026-03-14 09:53:12 -07:00
oobabooga c7953fb923 Add ROCm version to portable package filenames 2026-03-14 09:44:37 -07:00
oobabooga c908ac00d7 Replace html2text with trafilatura for better web content extraction
After this change a lot of boilerplate is removed from web pages, saving tokens on agentic loops.
2026-03-14 09:29:17 -07:00
oobabooga 8bff331893 UI: Fix tool call markup flashing before accordion appears during streaming 2026-03-14 09:26:20 -07:00
oobabooga cb08ba63dc Fix GPT-OSS channel markup leaking into UI when model skips analysis block 2026-03-14 09:08:05 -07:00
oobabooga 09a6549816 API: Stream reasoning_content separately from content in OpenAI-compatible responses 2026-03-14 06:52:40 -07:00
oobabooga accb2ef661 UI/API: Prevent tool call markup from leaking into streamed UI output (closes #7427) 2026-03-14 06:26:47 -07:00
oobabooga 998b9bfb2a UI: Make all chat styles better match instruct style 2026-03-13 21:07:40 -07:00
oobabooga 5f1707af35 UI: Increase the width of non-instruct chat styles 2026-03-13 20:38:40 -07:00
oobabooga 16636c04b8 UI: Minor fix/optimization 2026-03-13 19:06:04 -07:00
oobabooga e8d1c66303 Clean up tool calling code 2026-03-13 18:27:01 -07:00
oobabooga cb88066d15 Update llama.cpp 2026-03-13 13:17:41 -07:00
oobabooga 0cd245bcbb UI: Make autoscroll more robust after the optimizations 2026-03-13 12:58:56 -07:00
oobabooga 24e7e77b55 Clean up 2026-03-13 12:37:10 -07:00
oobabooga cabb95f0d6 UI: Increase the instruct width to 768px 2026-03-13 12:24:48 -07:00
oobabooga 5362bbb413 Make web_search not download the page contents, use fetch_webpage instead 2026-03-13 12:09:08 -07:00
oobabooga d4c22ced83 UI: Optimize syntax highlighting and autoscroll by moving from MutationObserver to morphdom updates 2026-03-13 15:47:14 -03:00
oobabooga aab2596d29 UI: Fix multiple thinking blocks rendering as raw text in HTML generator 2026-03-13 15:47:11 -03:00
oobabooga e0a38da9f3 Improve tool call parsing for Devstral/GPT-OSS and preserve thinking across tool turns 2026-03-13 11:04:06 -03:00
oobabooga e50b823eee Update llama.cpp 2026-03-13 06:22:28 -07:00
oobabooga b7670cc762 Add a tool calling tutorial 2026-03-13 04:35:42 -07:00
oobabooga d0b72c73c0 Update diffusers to 0.37 2026-03-13 03:43:02 -07:00
oobabooga c39c187f47 UI: Improve the style of table scrollbars 2026-03-13 03:21:47 -07:00
oobabooga 4628825651 Better solution to fef95b9e56 2026-03-13 03:17:36 -07:00
oobabooga fef95b9e56 UI: Fix an autoscroll race condition during chat streaming 2026-03-13 03:05:09 -07:00
oobabooga 5833d94d7f UI: Prevent word breaks in tables 2026-03-13 02:56:49 -07:00
oobabooga a4bef860b6 UI: Optimize chat streaming by batching morphdom to one update per animation frame
The monitor physically cannot paint faster than its refresh rate, so
intermediate morphdom calls between frames do redundant parsing, diffing,
and patching work that is never displayed.
2026-03-13 06:45:47 -03:00
oobabooga 5ddc1002d2 Update ExLlamaV3 to 0.0.25 2026-03-13 02:40:17 -07:00
oobabooga c094bc943c UI: Skip output extensions on intermediate tool-calling turns 2026-03-12 21:45:38 -07:00
oobabooga 85ec85e569 UI: Fix Continue while in a tool-calling loop, remove the upper limit on number of tool calls 2026-03-12 20:22:35 -07:00
oobabooga 04213dff14 Address copilot feedback 2026-03-12 19:55:20 -07:00
oobabooga 24fdcc52b3 Merge branch 'main' into dev 2026-03-12 19:33:03 -07:00
oobabooga 58f26a4cc7 UI: Skip redundant work in chat loop when no tools are selected 2026-03-12 19:18:55 -07:00
oobabooga 0e35421593 API: Always extract reasoning_content, even with tool calls 2026-03-12 18:52:41 -07:00
oobabooga 1ed56aee85 Add a calculate tool 2026-03-12 18:45:19 -07:00
oobabooga 286ae475f6 UI: Clean up tool calling code 2026-03-12 22:39:38 -03:00
oobabooga 4c7a56c18d Add num_pages and max_tokens kwargs to web search tools 2026-03-12 22:17:23 -03:00
oobabooga a09f21b9de UI: Fix tool calling for GPT-OSS and Continue 2026-03-12 22:17:20 -03:00
oobabooga 1b7e6c5705 Add the fetch_webpage tool source 2026-03-12 17:11:05 -07:00
oobabooga f8936ec47c Truncate web_search and fetch_webpage tools to 8192 tokens 2026-03-12 17:10:41 -07:00
oobabooga 5c02b7f603 Allow the fetch_webpage tool to return links 2026-03-12 17:08:30 -07:00
oobabooga 09d5e049d6 UI: Improve the Tools checkbox list style 2026-03-12 16:53:49 -07:00
oobabooga fdd8e5b1fd Make repeated Ctrl+C force a shutdown 2026-03-12 15:48:50 -07:00
oobabooga 4f82b71ef3 UI: Bump the ctx-size max from 131072 to 262144 (256K) 2026-03-12 14:56:35 -07:00
oobabooga bbd43d9463 UI: Correctly propagate truncation_length when ctx_size is auto 2026-03-12 14:54:05 -07:00
oobabooga 3e6bd1a310 UI: Prepend thinking tag when template appends it to prompt
Makes Qwen models have a thinking block straight away during streaming.
2026-03-12 14:30:51 -07:00
oobabooga 9a7428b627 UI: Add collapsible accordions for tool calling steps 2026-03-12 14:16:04 -07:00
oobabooga 2d0cc7726e API: Add reasoning_content field to non-streaming chat completions
Extract thinking/reasoning blocks (e.g. <think>...</think>) into a
separate reasoning_content field on the assistant message, matching
the convention used by DeepSeek, llama.cpp, and SGLang.
2026-03-12 16:30:46 -03:00
oobabooga d45c9b3c59 API: Minor logprobs fixes 2026-03-12 16:09:49 -03:00
oobabooga 2466305f76 Add tool examples 2026-03-12 16:03:57 -03:00
oobabooga a916fb0e5c API: Preserve mid-conversation system message positions 2026-03-12 14:27:24 -03:00
oobabooga fb1b3b6ddf API: Rewrite logprobs for OpenAI spec compliance across all backends
- Rewrite logprobs output format to match the OpenAI specification for
  both chat completions and completions endpoints
- Fix top_logprobs count being ignored for llama.cpp and ExLlamav3
  backends in chat completions (always returned 1 instead of requested N)
- Fix non-streaming responses only returning logprobs for the last token
  instead of all generated tokens (affects all HF-based loaders)
- Fix logprobs returning null for non-streaming chat requests on HF loaders
- Fix off-by-one returning one extra top alternative on HF loaders
2026-03-12 14:17:32 -03:00
oobabooga 5a017aa338 API: Several OpenAI spec compliance fixes
- Return proper OpenAI error format ({"error": {...}}) instead of HTTP 500 for validation errors
- Send data: [DONE] at the end of SSE streams
- Fix finish_reason so "tool_calls" takes priority over "length"
- Stop including usage in streaming chunks when include_usage is not set
- Handle "developer" role in messages (treated same as "system")
- Add logprobs and top_logprobs parameters for chat completions
- Fix chat completions logprobs not working with llama.cpp and ExLlamav3 backends
- Add max_completion_tokens as an alias for max_tokens in chat completions
2026-03-12 13:30:38 -03:00
oobabooga 4b6c9db1c9 UI: Fix stale tool_sequence after edit and chat-instruct tool rendering 2026-03-12 13:12:18 -03:00
oobabooga 09723c9988 API: Include /v1 in the printed API URL for easier integration 2026-03-12 12:43:15 -03:00
oobabooga 2549f7c33b API: Add tool_choice support and fix tool_calls spec compliance 2026-03-12 10:29:23 -03:00
oobabooga b5cac2e3b2 Fix swipes and edit for tool calling in the UI 2026-03-12 01:53:37 -03:00
oobabooga 0d62038710 Add tools refresh button and _tool_turn comment 2026-03-12 01:36:07 -03:00
oobabooga cf9ad8eafe Initial tool-calling support in the UI 2026-03-12 01:16:19 -03:00
oobabooga 980a9d1657 UI: Minor defensive changes to autosave 2026-03-11 15:50:16 -07:00
oobabooga bb00d96dc3 Use a new gr.DragDrop element for Sampler priority + update gradio 2026-03-11 19:35:12 -03:00
oobabooga 66c976e995 Update README with ROCm 7.2 torch install URL 2026-03-11 19:35:12 -03:00
oobabooga 24977846fb Update AMD ROCm from 6.4 to 7.2 2026-03-11 13:14:26 -07:00
oobabooga 7a63a56043 Update llama.cpp 2026-03-11 12:53:19 -07:00
oobabooga f1cfeae372 API: Improve OpenAI spec compliance in streaming and non-streaming responses 2026-03-10 20:55:49 -07:00
oobabooga 3304b57bdf Add native logit_bias and logprobs support for ExLlamav3 2026-03-10 11:03:25 -03:00
oobabooga 8aeaa76365 Forward logit_bias, logprobs, and n to llama.cpp backend
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
2026-03-10 10:41:45 -03:00
oobabooga 6ec4ca8b10 Add missing custom_token_bans to llama.cpp and reasoning_effort to ExLlamav3 2026-03-10 09:58:00 -03:00
oobabooga 307c085d1b Minor warning change 2026-03-09 21:44:53 -07:00
oobabooga c604ca66de Update the --multi-user warning 2026-03-09 21:36:04 -07:00
oobabooga 15792c3cb8 Update ExLlamaV3 to 0.0.24 2026-03-09 20:31:05 -07:00
oobabooga 3b71932658 Update README 2026-03-09 20:18:09 -07:00
oobabooga 83b7e47d77 Update README 2026-03-09 20:12:54 -07:00
oobabooga 7f485274eb Fix ExLlamaV3 EOS handling, load order, and perplexity evaluation
- Use config.eos_token_id_list for all EOS tokens as stop conditions
  (fixes models like Llama-3 that define multiple EOS token IDs)
- Load vision/draft models before main model so autosplit accounts
  for their VRAM usage
- Fix loss computation in ExLlamav3_HF: use cache across chunks so
  sequences longer than 2048 tokens get correct perplexity values
2026-03-09 23:56:38 -03:00
oobabooga 39e6c997cc Refactor to not import gradio in --nowebui mode 2026-03-09 19:29:24 -07:00
oobabooga 970055ca00 Update Intel GPU support to use native PyTorch XPU wheels
PyTorch 2.9+ includes native XPU support, making
intel-extension-for-pytorch and the separate oneAPI conda
install unnecessary.

Closes #7308
2026-03-09 17:08:59 -03:00
oobabooga d6643bb4bc One-click installer: Optimize wheel downloads to only re-download changed wheels 2026-03-09 12:30:43 -07:00
oobabooga 9753b2342b Fix crash on non-UTF-8 Windows locales (e.g. Chinese GBK)
Closes #7416
2026-03-09 16:22:37 -03:00
oobabooga eb4a20137a Update README 2026-03-08 20:38:50 -07:00
oobabooga 634609acca Fix pip installing to system Miniconda on Windows, revert 0132966d 2026-03-08 20:35:41 -07:00
oobabooga 40f1837b42 README: Minor updates 2026-03-08 08:38:29 -07:00
oobabooga f6ffecfff2 Add guard against training with llama.cpp loader 2026-03-08 10:47:59 -03:00
oobabooga 5a91b8462f Remove ctx_size_draft from ExLlamav3 loader 2026-03-08 09:53:48 -03:00
oobabooga 7a8ca9f2b0 Fix passing adaptive-p to llama-server 2026-03-08 04:09:40 -07:00
oobabooga 7170a16b91 Fix passing adaptive-p to llama-server 2026-03-08 04:09:18 -07:00
oobabooga b3705d87bf Add PyPI fallback for PyTorch install commands 2026-03-07 18:07:09 -08:00
oobabooga 0132966d09 Add PyPI fallback for PyTorch install commands 2026-03-07 23:06:15 -03:00
oobabooga baf4e13ff1 ExLlamav3: fix draft cache size to match main cache 2026-03-07 22:34:48 -03:00
oobabooga 6ff111d18e ExLlamav3: handle exceptions in ConcurrentGenerator iterate loop 2026-03-07 22:05:31 -03:00
oobabooga aeeff41cc0
Merge pull request #7412 from oobabooga/dev
Merge dev branch
2026-03-07 12:02:24 -03:00
oobabooga 0cecc0a041 Use tar.gz for Linux/macOS portable builds to preserve symlinks 2026-03-07 06:59:48 -08:00
oobabooga e1bf0b866f Update the macos workflow 2026-03-07 06:46:46 -08:00
oobabooga 3b7cf44406
Merge pull request #7411 from oobabooga/dev
Merge dev branch
2026-03-07 11:15:38 -03:00
oobabooga b686193fe2 Reapply "Update Miniforge from 25.3.0 to 26.1.0"
This reverts commit 085c4ef5d7.
2026-03-07 06:10:05 -08:00
oobabooga 328215b0c7 API: Stop generation on client disconnect for non-streaming requests 2026-03-07 06:06:13 -08:00
oobabooga 304510eb3d ExLlamav3: route all generation through ConcurrentGenerator 2026-03-07 05:54:14 -08:00
oobabooga 085c4ef5d7 Revert "Update Miniforge from 25.3.0 to 26.1.0"
This reverts commit 9576c5a5f4.
2026-03-07 05:09:49 -08:00
oobabooga aa634c77c0 Update llama.cpp 2026-03-06 21:00:36 -08:00
oobabooga abc699db9b Minor UI change 2026-03-06 19:03:38 -08:00
oobabooga f2fe001cc4 Fix message copy buttons not working over HTTP 2026-03-06 19:01:38 -08:00
oobabooga 7ea5513263 Handle Qwen 3.5 thinking blocks 2026-03-06 19:01:28 -08:00
oobabooga 5fa709a3f4 llama.cpp server: use port+5 offset and suppress No parser definition detected logs 2026-03-06 18:52:34 -08:00
oobabooga e8e0d02406 Remove outdated ROCm environment variable overrides from one_click.py 2026-03-06 18:15:05 -08:00
oobabooga 1eead661c3 Portable mode: always use ../user_data if it exists 2026-03-06 18:04:48 -08:00
oobabooga d48b53422f Training: Optimize _peek_json_keys to avoid loading entire file into memory 2026-03-06 15:39:08 -08:00
oobabooga 2beaa4b971 Update llama.cpp 2026-03-06 14:39:35 -08:00
oobabooga 5f6754c267 Fix stop button being ignored when token throttling is off 2026-03-06 17:12:34 -03:00
oobabooga b8b4471ab5 Security: restrict file writes to user_data_dir, block extra_flags from API 2026-03-06 16:58:11 -03:00
oobabooga d03923924a Several small fixes
- Stop llama-server subprocess on model unload instead of relying on GC
- Fix tool_calls[].index being string instead of int in API responses
- Omit tool_calls key from API response when empty per OpenAI spec
- Prevent division by zero when micro_batch_size > batch_size in training
- Copy sampler_priority list before mutating in ExLlamaV3
- Normalize presence/frequency_penalty names for ExLlamaV3 sampler sorting
- Restore original chat_template after training instead of leaving it mutated
2026-03-06 16:52:13 -03:00
oobabooga 044566d42d API: Add tool call parsing for DeepSeek, GLM, MiniMax, and Kimi models 2026-03-06 15:06:56 -03:00
oobabooga f5acf55207 Add --chat-template-file flag to override the default instruction template for API requests
Matches llama.cpp's flag name. Supports .jinja, .jinja2, and .yaml files.
Priority: per-request params > --chat-template-file > model's built-in template.
2026-03-06 14:04:16 -03:00
oobabooga 3531069824 API: Support Llama 4 tool calling and fix tool calling edge cases 2026-03-06 13:12:14 -03:00
oobabooga 160f7ad6b4 Handle SIGTERM to stop llama-server on pkill 2026-03-06 12:56:33 -03:00
oobabooga 8e24a20873 Installer: Fix libstdcxx-ng version pin causing conda solver to hang on Python 3.13 2026-03-06 07:39:50 -08:00
oobabooga 3bab7fbfd4 Update Colab notebook: new default model, direct GGUF URL support 2026-03-06 06:52:49 -08:00
oobabooga e7e0df0101 Fix hover menu shifting down when chat input grows 2026-03-06 11:52:16 -03:00
oobabooga 3323dedd08 Update llama.cpp 2026-03-06 06:30:01 -08:00
oobabooga 36dbc4ccce Remove unused colorama and psutil requirements 2026-03-06 06:28:35 -08:00
oobabooga 86d59b4404 Installer: Fix edge case in wheel re-download caching 2026-03-06 06:16:57 -08:00
oobabooga 0e0e3ceb97 Update the custom gradio wheels 2026-03-06 05:46:08 -08:00
oobabooga 6d7018069c Installer: Use absolute Python path in Windows batch scripts 2026-03-05 21:56:01 -08:00
oobabooga f9ed8820de API: Make tool function description and parameters optional 2026-03-05 21:43:33 -08:00
oobabooga 3880c1a406 API: Accept content:null and complex tool definitions in tool calling requests 2026-03-06 02:41:38 -03:00
oobabooga 93ebfa2b7e Fix llama-server output filter for new log format 2026-03-06 02:38:13 -03:00
oobabooga d0ac58ad31 API: Fix tool_calls placement and other response compatibility issues 2026-03-05 21:25:03 -08:00
oobabooga f06583b2b9 API: Use \n instead of \r\n as the SSE separator to match OpenAI 2026-03-05 21:16:37 -08:00
oobabooga 8be444a559 Update the custom gradio wheels 2026-03-05 21:05:15 -08:00
oobabooga 1729fb07b9 Update llama.cpp 2026-03-05 21:04:24 -08:00
oobabooga eba262d47a Security: prevent path traversal in character/user/file save and delete 2026-03-06 02:00:10 -03:00
oobabooga 521ddbb722 Security: restrict API model loading args to UI-exposed parameters
The /v1/internal/model/load endpoint previously allowed setting any
shared.args attribute, including security-sensitive flags like
trust_remote_code. Now only keys from list_model_elements() are accepted.
2026-03-06 01:57:02 -03:00
oobabooga 66fb79fe15 llama.cpp: Add --fit-target param 2026-03-06 01:55:48 -03:00
oobabooga e81a47f708 Improve the API generation defaults --help message 2026-03-05 20:41:45 -08:00
oobabooga 27bcc45c18 API: Add command-line flags to override default generation parameters 2026-03-06 01:36:45 -03:00
oobabooga 8a9afcbec6 Allow extensions to skip output post-processing 2026-03-06 01:19:46 -03:00
oobabooga 2e7e966ef2 Docs: Better Tool/Function calling examples 2026-03-05 20:06:34 -08:00
oobabooga ddcad3cc51 Follow-up to e2548f69: add missing paths module, fix gallery extension 2026-03-06 00:58:03 -03:00
oobabooga 8d43123f73 API: Fix function calling for Qwen, Mistral, GPT-OSS, and other models
The tool call response parser only handled JSON-based formats, causing
tool_calls to always be empty for models that use non-JSON formats.

Add parsers for three additional tool call formats:
- Qwen3.5: <tool_call><function=name><parameter=key>value</parameter>
- Mistral/Devstral: functionName{"arg": "value"}
- GPT-OSS: <|channel|>commentary to=functions.name<|message|>{...}

Also fix multi-turn tool conversations crashing with Jinja2
UndefinedError on tool_call_id by preserving tool_calls and
tool_call_id metadata through the chat history conversion.
2026-03-06 00:55:33 -03:00
oobabooga e2548f69a9 Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data
If --user-data-dir is not set, auto-detect: use ../user_data when
./user_data doesn't exist, making it easy to share user data across
portable builds by placing it one folder up.
2026-03-05 19:31:10 -08:00
oobabooga 4c406e024f API: Speed up chat completions by ~85ms per request 2026-03-05 18:36:07 -08:00
oobabooga 249bd6eea2 UI: Update the parallel info message 2026-03-05 18:11:55 -08:00
oobabooga f52d9336e5 TensorRT-LLM: Migrate from ModelRunner to LLM API, add concurrent API request support 2026-03-05 18:09:45 -08:00
oobabooga 9824c82cb6 API: Add parallel request support for llama.cpp and ExLlamaV3 2026-03-05 16:49:58 -08:00
oobabooga 2f08dce7b0 Remove ExLlamaV2 backend
- archived upstream: 7dc12af3a8
- replaced by ExLlamaV3, which has much better quantization accuracy
2026-03-05 14:02:13 -08:00
oobabooga 134ac8fc29 Update README 2026-03-05 12:30:28 -08:00
oobabooga 409db3df1e Training: Docs improvements 2026-03-05 11:30:57 -08:00
oobabooga 86d8291e58 Training: UI cleanup and better defaults 2026-03-05 11:20:55 -08:00
oobabooga 33ff3773a0 Clean up LoRA loading parameter handling 2026-03-05 16:00:13 -03:00
oobabooga 7a1fa8c9ea Training: fix checkpoint resume and surface training errors to UI 2026-03-05 15:50:39 -03:00
oobabooga 275810c843 Training: wire up HF Trainer checkpoint resumption for full state recovery 2026-03-05 15:32:49 -03:00
oobabooga 438e59498e Update ExLlamaV3 to v0.0.23 2026-03-05 10:24:31 -08:00
oobabooga 63f28cb4a2 Training: align defaults with peft/axolotl (rank 8, alpha 16, dropout 0, cutoff 512, eos on) 2026-03-05 15:12:32 -03:00
oobabooga 33a38d7ece Training: drop conversations exceeding cutoff length instead of truncating 2026-03-05 14:56:27 -03:00
oobabooga c2e494963f Training: fix silent error on model reload failure, minor cleanups 2026-03-05 14:41:44 -03:00
oobabooga 5b18be8582 Training: unify instruction training through apply_chat_template()
Instead of two separate paths (format files vs Chat Template), all
instruction training now uses apply_chat_template() with assistant-only
label masking. Users pick a Jinja2 template from the dropdown or use the
model's built-in chat template — both work identically.
2026-03-05 14:39:37 -03:00
oobabooga d337ba0390 Training: fix apply_chat_template returning BatchEncoding instead of list 2026-03-05 13:45:28 -03:00
oobabooga 5be68cc073 Remove Training_PRO extension
The built-in training tab now covers its essential functionality
with a more modern and correct implementation (apply_chat_template,
dynamic padding, JSONL datasets, stride overlap).
2026-03-05 12:55:07 -03:00
oobabooga 1ffe540c97 Full documentation update to match current codebase 2026-03-05 12:46:54 -03:00
oobabooga 1c2548fd89 Training: use dynamic padding (pad to batch max instead of cutoff_len)
- Remove pre-padding from tokenize() and tokenize_conversation()
- Collate function now right-pads each batch to the longest sequence
- Set tokenizer padding_side to "right" (standard for training)
- Remove dead natural_keys import
- Reduces wasted compute on batches with short sequences
- Aligns with axolotl/unsloth approach
2026-03-05 12:45:32 -03:00
oobabooga da2d4f1a6a Training: replace raw text file with JSONL text dataset, re-add stride overlap
- Replace "Raw text file" tab with "Text Dataset" tab using JSONL format with "text" key per row
- Re-add stride overlap for chunking (configurable Stride Length slider, 0-2048 tokens)
- Pad remainder chunks instead of dropping them
- Remove hard_cut_string, min_chars, raw_text_file parameters
- Remove .txt file and directory loading support
2026-03-05 12:33:12 -03:00
oobabooga d278bb46a2 Add apply_chat_template() support for LoRA training
- Support multi-turn conversations (OpenAI messages + ShareGPT formats)
- Automatic assistant-only label masking via incremental tokenization
- Use tokenizer.apply_chat_template() for proper special token handling
- Add "Chat Template" option to the Data Format dropdown
- Also accept instruction/output datasets (auto-converted to messages)
- Validate chat template availability and dataset format upfront
- Fix after_tokens[-1] IndexError when train_only_after is at end of prompt
- Update docs
2026-03-05 11:47:25 -03:00
oobabooga b16a1a874a Update TensorRT-LLM Dockerfile for v1.1.0 2026-03-05 06:23:56 -08:00
oobabooga 45188eccef Overhaul LoRA training tab
- Use peft's "all-linear" for target modules instead of the old
  model_to_lora_modules mapping (only knew ~39 model types)
- Add "Target all linear layers" checkbox, on by default
- Fix labels in tokenize() — were [1]s instead of actual token IDs
- Replace DataCollatorForLanguageModeling with custom collate_fn
- Raw text: concatenate-and-split instead of overlapping chunks
- Adapter backup/loading: check safetensors before bin
- Fix report_to=None crash on transformers 5.x
- Fix no_cuda deprecation for transformers 5.x (use use_cpu)
- Move torch.compile before Trainer init
- Add remove_unused_columns=False (torch.compile breaks column detection)
- Guard against no target modules selected
- Set tracked.did_save so we don't always save twice
- pad_token_id: fall back to eos_token_id instead of hardcoding 0
- Drop MODEL_CLASSES, split_chunks, cut_chunk_for_newline
- Update docs
2026-03-05 10:52:59 -03:00
oobabooga 268cc3f100 Update TensorRT-LLM to v1.1.0 2026-03-05 09:32:28 -03:00
oobabooga 69fa4dd0b1 llama.cpp: allow ctx_size=0 for auto context via --fit 2026-03-04 19:33:20 -08:00
oobabooga fbfcd59fe0 llama.cpp: Use -1 instead of 0 for auto gpu_layers 2026-03-04 19:21:45 -08:00
oobabooga d45aa6606a Fix blank prompt dropdown in Notebook/Default tabs on first startup 2026-03-04 19:07:55 -08:00
oobabooga 0804296f4d Revert "UI: Remove unnecessary server round-trips from button click chains"
This reverts commit ff48956cb0.
2026-03-04 18:41:30 -08:00
oobabooga 6a08e79fa5 Update the custom gradio wheels 2026-03-04 18:22:50 -08:00
oobabooga ff48956cb0 UI: Remove unnecessary server round-trips from button click chains 2026-03-04 18:19:56 -08:00
oobabooga 5a22970ba8 Docker: fix and clean up configs, update docs 2026-03-04 23:13:47 -03:00
oobabooga 387cf9d8df Remove obsolete DeepSpeed inference code (2023 relic) 2026-03-04 17:20:34 -08:00
oobabooga 942ff8fcb4 Remove obsolete stuff after custom gradio updates 2026-03-04 16:43:32 -08:00
oobabooga da3010c3ed tiny improvements to llama_cpp_server.py 2026-03-04 15:54:37 -08:00
oobabooga 83cc207ef7 Update the custom gradio wheels 2026-03-04 14:31:18 -08:00
thecaptain789 2ac4eb33c8
fix: correct typo 'occured' to 'occurred' (#7389) 2026-03-04 18:09:28 -03:00
Sense_wang 7bf15ad933
fix: replace bare except clauses with except Exception (#7400) 2026-03-04 18:06:17 -03:00
mamei16 1d1f4dfc88
Disable uncommonly used indented codeblocks (#7401) 2026-03-04 17:51:00 -03:00
mamei16 abb7cc02e9
Re-introduce inline LaTeX rendering with more robust exception handling (#7402) 2026-03-04 17:44:19 -03:00
mamei16 68109bc5da
Improve process_markdown_content (#7403) 2026-03-04 17:26:13 -03:00
weiguang li 952e2c404a
Bump sentence-transformers from 2.2.2 to 3.3.1 in superbooga (#7406) 2026-03-04 17:08:08 -03:00
oobabooga cdf0e392e6 llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults 2026-03-04 12:05:08 -08:00
oobabooga eb90daf098 ExLlamaV2: Don't expose unused seed parameter 2026-03-04 11:14:50 -08:00
oobabooga 0ffb75de7c Update Transformers to 5.3.0 2026-03-04 11:12:54 -08:00
oobabooga d8af0505a8 ExLlamav3_HF: Optimize prefill and fix CFG cache initialization 2026-03-04 11:09:58 -08:00
oobabooga 9b916f02cd ExLlamaV3: Attach AdaptiveP, fix speculative decoding parameter, add seed 2026-03-04 10:51:15 -08:00
oobabooga 5d93f4e800 Fix requires_grad warning in logits API 2026-03-04 10:43:23 -08:00
oobabooga 64eb77e782 Fix the logits API endpoint with transformers 2026-03-04 10:41:47 -08:00
oobabooga 22141679e3 Update the custom gradio wheels 2026-03-04 10:01:31 -08:00
oobabooga 65de4c30c8 Add adaptive-p sampler and n-gram speculative decoding support 2026-03-04 09:41:29 -08:00
oobabooga f010aa1612 Replace PyPDF2 with pymupdf for PDF text extraction
pymupdf produces cleaner text (e.g. no concatenated words in headers),
handles encrypted and malformed PDFs that PyPDF2 failed on, and
supports non-Latin scripts.
2026-03-04 06:43:37 -08:00
oobabooga f4d787ab8d Delegate GPU layer allocation to llama.cpp's --fit 2026-03-04 06:37:50 -08:00
oobabooga 8a3d866401 Fix temperature_last having no effect in llama.cpp server sampler order 2026-03-04 06:10:51 -08:00
oobabooga 11dc6fdfce Update the custom gradio wheels 2026-03-04 06:04:33 -08:00
oobabooga 7d42b6900e Update the custom gradio wheels 2026-03-04 05:47:59 -08:00
oobabooga 8cbb7661a8 Remove no longer needed dark theme localstorage code 2026-03-03 18:51:24 -08:00
oobabooga 866c48e55b Simplify dark theme handling using gradio fork's new dark_theme parameter 2026-03-03 18:41:47 -08:00
oobabooga b3fd0d16e0 Use a new gr.Headless component for efficient chat streaming 2026-03-03 18:12:03 -08:00
oobabooga d584ede72e Avoid a circular import 2026-03-03 17:59:47 -08:00
oobabooga c0bff831e3 Update custom gradio wheels 2026-03-03 17:21:18 -08:00
oobabooga 2260e530c9 Remove gradio monkey-patches (moved to gradio fork) 2026-03-03 17:17:36 -08:00
oobabooga e9f22813e4 Replace gradio with my gradio 4.37.2 fork 2026-03-03 16:51:27 -08:00
dependabot[bot] 3519890c8e
Bump flask-cloudflared from 0.0.14 to 0.0.15 in /requirements/full (#7380) 2026-03-03 21:41:51 -03:00
dependabot[bot] 9c604628a0
Bump flask-cloudflared from 0.0.14 to 0.0.15 in /requirements/portable (#7382) 2026-03-03 21:41:46 -03:00
oobabooga fbd2acfa19 Remove triton-windows from non-CUDA requirements 2026-03-03 16:16:55 -08:00
oobabooga 5fd79b23d1 Add CUDA 13.1 portable builds 2026-03-03 15:36:41 -08:00
oobabooga b8fcc8ea32 Update llama.cpp, remove noavx2 builds, add ROCm Windows portable builds 2026-03-03 15:27:19 -08:00
Pádraic Slattery d7dd533b99
chore: Update outdated GitHub Actions versions (#7384) 2026-03-03 17:54:12 -03:00
oobabooga 9576c5a5f4 Update Miniforge from 25.3.0 to 26.1.0 2026-03-03 12:33:20 -08:00
oobabooga 9814d3d0ae Patch gradio 4.x for huggingface-hub 1.x compatibility 2026-03-03 12:20:37 -08:00
oobabooga 38d0eeefc0 Update dependencies: torch 2.9.1, transformers 5.2, exllamav3 0.0.22, accelerate 1.12, huggingface-hub 1.5 2026-03-03 12:01:02 -08:00
oobabooga ddd74324fe Update PyTorch to 2.9.1 and ROCm to 6.4 2026-03-03 11:38:52 -08:00
oobabooga efc72d5c32 Update Python from 3.11 to 3.13 2026-03-03 11:03:26 -08:00
oobabooga aecbc5a8ac Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2026-01-28 08:30:28 -08:00
oobabooga c54e8a2b3d Try to spawn llama.cpp on port 5001 instead of random port 2026-01-28 08:23:55 -08:00
oobabooga dc2bbf1861 Refactor thinking block detection and add Solar Open support 2026-01-28 08:21:34 -08:00
dependabot[bot] cae1fef42d
Bump triton-windows in /requirements/full (#7368) 2026-01-14 21:30:59 -03:00
q5sys (JT) 7493fe7841
feat: Add a dropdown to save/load user personas (#7367) 2026-01-14 20:35:08 -03:00
jakubartur 21b979c02a
Fix code block copy button on HTTP (Clipboard API fallback) (#7358) 2026-01-14 19:34:21 -03:00
oobabooga a731861127 Update README 2026-01-13 15:38:32 -08:00
oobabooga 910456ba31
Merge pull request #7366 from oobabooga/dev
Merge dev branch
2026-01-08 17:54:12 -03:00
oobabooga d79cdc614c Update llama.cpp 2026-01-08 11:24:15 -08:00
oobabooga 332fd40653 Update llama.cpp 2026-01-07 19:06:23 -08:00
dependabot[bot] 50a35b483c
Update bitsandbytes requirement in /requirements/full (#7353) 2026-01-06 15:27:23 -03:00
dependabot[bot] 45fbec0320
Update torchao requirement in /requirements/full (#7356) 2026-01-06 15:27:10 -03:00
oobabooga b0968ed8b4 Update flash-linear-attention 2026-01-06 10:26:43 -08:00
oobabooga 36747cf99c Lint 2026-01-06 10:24:34 -08:00
oobabooga 2fcbadec67 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2026-01-06 10:24:07 -08:00
oobabooga bb3b7bc197 Update llama.cpp 2026-01-06 10:23:58 -08:00
Sergey 'Jin' Bostandzhyan 6e2c4e9c23
Fix loading models which have their eos token disabled (#7363) 2026-01-06 11:31:10 -03:00
oobabooga a2ed640aa6
UI: Improved border color for tables + hr 2025-12-21 15:38:48 -03:00
oobabooga 1066fe8c21
UI: Improve table styles (more minimalistic) 2025-12-21 15:32:02 -03:00
oobabooga 9530d3a6d8
UI: Improve hr (horizontal separator) style 2025-12-21 15:30:54 -03:00
oobabooga a0b5599e9b
Merge pull request #7355 from oobabooga/dev
Merge dev branch
2025-12-20 02:18:31 -03:00
oobabooga 09d88f91e8 Update llama.cpp 2025-12-19 21:00:13 -08:00
oobabooga 34804f9354
Merge pull request #7352 from oobabooga/dev
Merge dev branch
2025-12-14 22:59:34 -03:00
oobabooga 6e8fb0e7b1 Update llama.cpp 2025-12-14 13:32:14 -08:00
oobabooga 9fe40ff90f Update exllamav3 to 0.0.18 2025-12-10 05:37:33 -08:00
oobabooga 8e762e04b4 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-12-09 05:27:43 -08:00
oobabooga aa16266c38 Update llama.cpp 2025-12-09 03:19:23 -08:00
dependabot[bot] 85269d7fbb
Update safetensors requirement in /requirements/full (#7323) 2025-12-08 17:58:27 -03:00
dependabot[bot] c4ebab9b29
Bump triton-windows in /requirements/full (#7346) 2025-12-08 17:56:07 -03:00
oobabooga bb004bacb1
Merge pull request #7345 from oobabooga/dev
Merge dev branch
2025-12-08 10:14:49 -03:00
oobabooga 502f59d39b Update diffusers to 0.36 2025-12-08 05:08:54 -08:00
oobabooga 4d94f66832
Merge pull request #7343 from oobabooga/dev
Merge dev branch
2025-12-07 23:49:19 -03:00
oobabooga e7c8b51fec Revert "Use flash_attention_2 by default for Transformers models"
This reverts commit 85f2df92e9.
2025-12-07 18:48:41 -08:00
oobabooga 652d13c003
Merge pull request #7339 from oobabooga/dev
Merge dev branch
2025-12-07 17:58:00 -03:00
oobabooga b758059e95 Revert "Clear the torch cache between sequential image generations"
This reverts commit 1ec9f708e5.
2025-12-07 12:23:19 -08:00
oobabooga 1ec9f708e5 Clear the torch cache between sequential image generations 2025-12-07 11:49:22 -08:00
oobabooga 3b8369a679 Update llama.cpp 2025-12-07 11:18:36 -08:00
oobabooga 058e78411d docs: Small changes 2025-12-07 10:16:08 -08:00
oobabooga 17bd8d10f0 Update exllamav3 to 0.0.17 2025-12-07 09:37:18 -08:00
oobabooga 85f2df92e9 Use flash_attention_2 by default for Transformers models 2025-12-07 06:56:58 -08:00
oobabooga 1762312fb4 Use random instead of np.random for image seeds (makes it work on Windows) 2025-12-06 20:10:32 -08:00
oobabooga 160a25165a docs: Small change 2025-12-06 08:41:12 -08:00
oobabooga f93cc4b5c3 Add an API example to the image generation tutorial 2025-12-06 08:33:06 -08:00
oobabooga c026dbaf64 Fix API requests always returning the same 'created' time 2025-12-06 08:23:21 -08:00
oobabooga 194e4c285f Update llama.cpp 2025-12-06 08:14:48 -08:00
oobabooga 1c36559e2b Add a News section to the README 2025-12-06 07:05:00 -08:00
oobabooga 02518a96a9 Lint 2025-12-06 06:55:06 -08:00
oobabooga 0100ad1bd7 Add user_data/image_outputs to the Gradio allowed paths 2025-12-06 06:39:30 -08:00
oobabooga 6411142111 docs: Small changes 2025-12-06 06:36:16 -08:00
oobabooga 455dc06db0 Serve the original PNG images in the UI instead of webp 2025-12-06 05:43:00 -08:00
oobabooga 1a9ed1fe98 Fix the height of the image output gallery 2025-12-06 05:21:26 -08:00
oobabooga 17b12567d8 docs: Small changes 2025-12-05 14:15:15 -08:00
oobabooga e20b2d38ff docs: Add VRAM measurements for Z-Image-Turbo 2025-12-05 14:12:08 -08:00
oobabooga 6ca99910ba Image: Quantize the text encoder for lower VRAM 2025-12-05 13:08:46 -08:00
oobabooga 11937de517 Use flash attention for image generation by default 2025-12-05 12:13:24 -08:00
oobabooga eba8a59466 docs: Improve the image generation tutorial 2025-12-05 12:10:41 -08:00
oobabooga 5848c7884d Increase the height of the image output gallery 2025-12-05 10:24:51 -08:00
oobabooga c11c14590a Image: Better LLM variation default prompt 2025-12-05 08:08:11 -08:00
oobabooga 0dd468245c Image: Add back the gallery cache (for performance) 2025-12-05 07:11:38 -08:00
oobabooga b63d57158d Image: Add TGW as a prefix to output images 2025-12-05 05:59:54 -08:00
oobabooga afa29b9554 Image: Several fixes 2025-12-05 05:58:57 -08:00
oobabooga 8eac99599a Image: Better LLM variation default prompt 2025-12-04 19:58:06 -08:00
oobabooga b4f06a50b0 fix: Pass bos_token and eos_token from metadata to jinja2
Fixes loading Seed-Instruct-36B
2025-12-04 19:11:31 -08:00
oobabooga 15c6e43597 Image: Add a revised_prompt field to API results for OpenAI compatibility 2025-12-04 17:41:09 -08:00
oobabooga 56f2a9512f Revert "Image: Add the LLM-generated prompt to the API result"
This reverts commit c7ad28a4cd.
2025-12-04 17:34:27 -08:00
oobabooga 3ef428efaa Image: Remove llm_variations from the API 2025-12-04 17:34:17 -08:00
oobabooga c7ad28a4cd Image: Add the LLM-generated prompt to the API result 2025-12-04 17:22:08 -08:00
oobabooga b451bac082 Image: Improve a log message 2025-12-04 16:33:46 -08:00
oobabooga 47a0fcd614 Image: PNG metadata improvements 2025-12-04 16:25:48 -08:00
oobabooga ac31a7c008 Image: Organize the UI 2025-12-04 15:45:04 -08:00
oobabooga a90739f498 Image: Better LLM variation default prompt 2025-12-04 10:50:40 -08:00
oobabooga ffef3c7b1d Image: Make the LLM Variations prompt configurable 2025-12-04 10:44:35 -08:00
oobabooga 5763947c37 Image: Simplify the API code, add the llm_variations option 2025-12-04 10:23:00 -08:00
oobabooga 2793153717 Image: Add LLM-generated prompt variations 2025-12-04 08:10:24 -08:00
oobabooga 7fb9f19bd8 Progress bar style improvements 2025-12-04 06:20:45 -08:00
oobabooga a838223d18 Image: Add a progress bar during generation 2025-12-04 05:49:57 -08:00
oobabooga 14dbc3488e Image: Clear the torch cache after generation, not before 2025-12-04 05:32:58 -08:00
oobabooga 235b94f097 Image: Add placeholder file for user_data/image_models 2025-12-03 18:43:30 -08:00
oobabooga c357eed4c7 Image: Remove the flash_attention_3 option (no idea how to get it working) 2025-12-03 18:40:34 -08:00
oobabooga c93d27add3 Update llama.cpp 2025-12-03 18:29:43 -08:00
oobabooga fbca54957e Image generation: Yield partial results for batch count > 1 2025-12-03 16:13:07 -08:00
oobabooga 49c60882bf Image generation: Safer image uploading 2025-12-03 16:07:51 -08:00
oobabooga 59285d501d Image generation: Small UI improvements 2025-12-03 16:03:31 -08:00
oobabooga 373baa5c9c UI: Minor image gallery improvements 2025-12-03 14:45:02 -08:00
oobabooga 906dc54969 Load --image-model before --model 2025-12-03 12:15:38 -08:00
oobabooga 4468c49439 Add semaphore to image generation API endpoint 2025-12-03 12:02:47 -08:00
oobabooga 5ad174fad2 docs: Add an image generation API example 2025-12-03 11:58:54 -08:00
oobabooga 5433ef3333 Add an API endpoint for generating images 2025-12-03 11:50:56 -08:00
oobabooga 9448bf1caa Image generation: add torchao quantization (supports torch.compile) 2025-12-02 14:22:51 -08:00
oobabooga 97281ff831 UI: Fix an index error in the new image gallery 2025-12-02 11:20:52 -08:00
oobabooga 9d07d3a229 Make portable builds functional again after b3666e140d 2025-12-02 10:06:57 -08:00
oobabooga 6291e72129 Remove quanto for now (requires messy compilation) 2025-12-02 09:57:18 -08:00
oobabooga b3666e140d
Add image generation support (#7328) 2025-12-02 14:55:38 -03:00
oobabooga a83821e941 Revert "UI: Optimize typing in all textareas"
This reverts commit e24ba92ef2.
2025-12-01 10:34:23 -08:00
oobabooga 24fd963c38 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-12-01 08:06:08 -08:00
oobabooga e24ba92ef2 UI: Optimize typing in all textareas 2025-12-01 08:05:21 -08:00
oobabooga bd9f2de73a
Merge pull request #7331 from oobabooga/dev
Merge dev branch
2025-11-28 23:00:01 -03:00
aidevtime 661e42d2b7
fix(deps): upgrade coqui-tts to >=0.27.0 for transformers 4.55 compatibility (#7329) 2025-11-28 22:59:36 -03:00
oobabooga 5327bc9397
Update modules/shared.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:48:05 -03:00
oobabooga 78b315344a Update exllamav3 2025-11-28 06:45:05 -08:00
oobabooga 3cad0cd4c1 Update llama.cpp 2025-11-28 03:52:37 -08:00
GodEmperor785 400bb0694b
Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316) 2025-11-21 16:56:02 -03:00
oobabooga 8f0048663d More modular HTML generator 2025-11-21 07:09:16 -08:00
oobabooga b0baf7518b Remove macos x86-64 portable builds (macos-13 runner deprecated by GitHub) 2025-11-19 06:07:15 -08:00
oobabooga 1afe0827ba
Merge pull request #7317 from oobabooga/dev
Merge dev branch
2025-11-19 11:04:02 -03:00
oobabooga 0d4eff284c Add a --cpu-moe model for llama.cpp 2025-11-19 05:23:43 -08:00
oobabooga d6f39e1fef Add ROCm portable builds 2025-11-18 16:32:20 -08:00
oobabooga 327a234d23 Add ROCm requirements.txt files 2025-11-18 16:24:56 -08:00
oobabooga 4e4abd0841 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-11-18 14:07:05 -08:00
oobabooga c45f35ccc2 Remove the macos 13 wheels (deprecated by GitHub) 2025-11-18 14:06:42 -08:00
oobabooga d85b95bb15 Update llama.cpp 2025-11-18 14:06:04 -08:00
dependabot[bot] 4a36b7be5b
Bump triton-windows in /requirements/full (#7311) 2025-11-18 18:51:26 -03:00
dependabot[bot] 3d7e9856a2
Update peft requirement from ==0.17.* to ==0.18.* in /requirements/full (#7310) 2025-11-18 18:51:15 -03:00
oobabooga a26e28bdea Update exllamav3 to 0.0.15 2025-11-18 11:24:16 -08:00
oobabooga 6a3bf1de92 Update exllamav3 to 0.0.14 2025-11-09 19:43:53 -08:00
oobabooga 9ad9afad7d
Merge pull request #7296 from oobabooga/dev
Merge dev branch
2025-11-06 00:38:25 -03:00
oobabooga e7534a90d8 Update llama.cpp 2025-11-05 18:46:01 -08:00
oobabooga 6be1bfcc87 Remove the CUDA 11.7 portable builds 2025-11-05 05:45:10 -08:00
oobabooga 92d9cd36a6 Update llama.cpp 2025-11-05 05:43:34 -08:00
oobabooga 67f9288891 Pin huggingface-hub to 0.36.0 (solves #7284 and #7289) 2025-11-02 14:01:00 -08:00
oobabooga 16f77b74c4 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-11-01 19:58:53 -07:00
oobabooga cd645f80f8 Update exllamav3 to 0.0.12 2025-11-01 19:58:18 -07:00
Trenten Miller 6871484398
fix: Rename 'evaluation_strategy' to 'eval_strategy' in training 2025-10-28 16:48:04 -03:00
oobabooga 338ae36f73 Add weights_only=True to torch.load in Training_PRO 2025-10-28 12:43:16 -07:00
dependabot[bot] c8cd840b24
Bump flash-linear-attention from 0.3.2 to 0.4.0 in /requirements/full (#7285)
Bumps [flash-linear-attention](https://github.com/fla-org/flash-linear-attention) from 0.3.2 to 0.4.0.
- [Release notes](https://github.com/fla-org/flash-linear-attention/releases)
- [Commits](https://github.com/fla-org/flash-linear-attention/compare/v0.3.2...v0.4.0)

---
updated-dependencies:
- dependency-name: flash-linear-attention
  dependency-version: 0.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-28 10:07:03 -03:00
oobabooga fc67e5e692
Merge pull request #7279 from oobabooga/dev
Merge dev branch
2025-10-23 12:50:31 -03:00
oobabooga f4c9e67155 Update llama.cpp 2025-10-23 08:19:32 -07:00
Immanuel 9a84a828fc
Fixed python requirements for apple devices with macos tahoe (#7273) 2025-10-22 14:59:27 -03:00
reksarka 138cc654c4
Make it possible to run a portable Web UI build via a symlink (#7277) 2025-10-22 14:55:17 -03:00
oobabooga 24fd2b4dec Update exllamav3 to 0.0.11 2025-10-21 07:26:38 -07:00
oobabooga be81f050a7 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-10-20 19:43:36 -07:00
oobabooga 9476123ee6 Update llama.cpp 2025-10-20 19:43:26 -07:00
dependabot[bot] 0d85744205
Bump triton-windows in /requirements/full (#7274) 2025-10-20 20:36:55 -03:00
oobabooga 771130532c
Merge pull request #7267 from oobabooga/dev
Merge dev branch
2025-10-15 17:15:28 -03:00
oobabooga a156ebbf76 Lint 2025-10-15 13:15:01 -07:00
oobabooga c871d9cdbd Revert "Same as 7f06aec3a1 but for exllamav3_hf"
This reverts commit deb37b821b.
2025-10-15 13:05:41 -07:00
oobabooga 163d863443 Update llama.cpp 2025-10-15 11:23:10 -07:00
oobabooga c93d567f97 Update exllamav3 to 0.0.10 2025-10-15 06:41:09 -07:00
oobabooga b5a6904c4a Make --trust-remote-code immutable from the UI/API 2025-10-14 20:47:01 -07:00
oobabooga efaf2aef3d Update exllamav3 to 0.0.9 2025-10-13 15:32:25 -07:00
oobabooga 047855c591 Update llama.cpp 2025-10-13 15:32:03 -07:00
mamei16 308e726e11
log error when llama-server request exceeds context size (#7263) 2025-10-12 23:00:11 -03:00
oobabooga 611399e089 Update README 2025-10-11 17:22:48 -07:00
oobabooga 968c79db06 Minor README fix (closes #7251) 2025-10-11 17:20:49 -07:00
oobabooga 655c3e86e3 Fix "continue" missing an initial space in chat-instruct/chat modes 2025-10-11 17:00:25 -07:00
oobabooga c7dd920dc8 Fix metadata leaking into branched chats 2025-10-11 14:12:05 -07:00
oobabooga 1831b3fb51 Use my custom gradio_client build (small changes to work with pydantic 2.11) 2025-10-10 18:01:21 -07:00
oobabooga dd0b003493 Bump pydantic to 2.11.0 2025-10-10 17:52:16 -07:00
oobabooga a74596374d Reapply "Update exllamav3 to 0.0.8"
This reverts commit 748007f6ee.
2025-10-10 17:51:31 -07:00
oobabooga 78ff21d512 Organize the --help message 2025-10-10 15:21:08 -07:00
oobabooga 5d734cc7ca Remove unused CSS 2025-10-10 12:54:54 -07:00
oobabooga 25360387ec Downloader: Fix resuming downloads after HF moved to Xet 2025-10-10 08:27:40 -07:00
oobabooga 7833650aa1
Merge pull request #7260 from oobabooga/dev
Merge dev branch
2025-10-10 10:46:34 -03:00
oobabooga bf5d85c922 Revert "Downloader: Gracefully handle '416 Range Not Satisfiable' when continuing downloads"
This reverts commit 1aa2b924d2.
2025-10-09 17:22:41 -07:00
oobabooga 0d03813e98
Update modules/chat.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 21:01:13 -03:00
oobabooga 748007f6ee Revert "Update exllamav3 to 0.0.8"
This reverts commit 977ffbaa04.
2025-10-09 16:50:00 -07:00
dependabot[bot] af3c70651c
Update bitsandbytes requirement in /requirements/full (#7255) 2025-10-09 19:53:34 -03:00
oobabooga 977ffbaa04 Update exllamav3 to 0.0.8 2025-10-09 15:53:14 -07:00
oobabooga e0f0fae59d Exllamav3: Add fla to requirements for qwen3-next 2025-10-09 13:03:48 -07:00
oobabooga deb37b821b Same as 7f06aec3a1 but for exllamav3_hf 2025-10-09 13:02:38 -07:00
oobabooga 7f06aec3a1 exllamav3: Implement the logits function for /v1/internal/logits 2025-10-09 11:24:25 -07:00
oobabooga 218dc01b51 Add fallbacks after 93aa7b3ed3 2025-10-09 10:59:34 -07:00
oobabooga 1aa2b924d2 Downloader: Gracefully handle '416 Range Not Satisfiable' when continuing downloads 2025-10-09 10:52:31 -07:00
oobabooga 0f3793d608 Update llama.cpp 2025-10-09 09:38:22 -07:00
oobabooga 282aa19189 Safer profile picture uploading 2025-10-09 09:26:35 -07:00
oobabooga 93aa7b3ed3 Better handle multigpu setups with transformers + bitsandbytes 2025-10-09 08:49:44 -07:00
Ionoclast Laboratories d229dfe991
Fix portable apple intel requirement for llama binaries (issue #7238) (#7239) 2025-10-08 12:40:53 -03:00
oobabooga 292c91abbb Update llama.cpp 2025-10-08 08:31:34 -07:00
oobabooga f660e0836b Merge branch 'main' into dev 2025-10-08 05:38:33 -07:00
oobabooga 898a3ed2fe
Add sponsor (Warp) to README <3 2025-10-07 18:33:28 -03:00
oobabooga 22997c134e Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-10-05 20:34:49 -07:00
Remowylliams 38a7fd685d
chat.py fixes Instruct mode History 2025-10-05 11:34:47 -03:00
oobabooga 64829071e0 Update llama.cpp 2025-10-05 07:32:41 -07:00
oobabooga 0eb8543d74 Update transformers 2025-10-05 07:30:33 -07:00
oobabooga b7effb22e0 Update exllamav3 2025-10-05 07:29:57 -07:00
oobabooga 042b828c73
Merge pull request #7231 from oobabooga/dev
Merge dev branch
2025-09-21 01:18:56 -03:00
oobabooga 8c9df34696 Update llama.cpp 2025-09-20 20:57:15 -07:00
oobabooga 1e863a7113 Fix exllamav3 ignoring the stop button 2025-09-19 16:12:50 -07:00
oobabooga 005fcf3f98 Formatting 2025-09-17 21:58:37 -07:00
oobabooga e4412f0634 Slightly more robust syntax highlighting 2025-09-17 21:57:17 -07:00
stevenxdavis dd6d2223a5
Changing transformers_loader.py to Match User Expectations for --bf16 and Flash Attention 2 (#7217) 2025-09-17 16:39:04 -03:00
oobabooga 9e9ab39892 Make exllamav3_hf and exllamav2_hf functional again 2025-09-17 12:29:22 -07:00
oobabooga 9c0a833a0a Revert "Update bitsandbytes requirement in /requirements/full (#7193)"
This reverts commit fe15b67160.
2025-09-17 11:58:54 -07:00
oobabooga 8087a57fd8 Bump transformers to 4.56 2025-09-17 08:19:18 -07:00
dependabot[bot] 7131a478b9
Update safetensors requirement in /requirements/full (#7192) 2025-09-17 12:18:13 -03:00
dependabot[bot] fe15b67160
Update bitsandbytes requirement in /requirements/full (#7193) 2025-09-17 12:17:58 -03:00
dependabot[bot] 8f731a566c
Update peft requirement from ==0.16.* to ==0.17.* in /requirements/full (#7172) 2025-09-17 12:17:16 -03:00
oobabooga 483927a5be Update llama.cpp 2025-09-17 05:09:12 -07:00
oobabooga 557b78d31e Update llama.cpp 2025-09-03 16:50:03 -07:00
oobabooga ba62783b72 UI: Don't use $ $ for LaTeX, only $$ $$ 2025-09-02 14:22:22 -07:00
oobabooga d3a7710c62
Merge pull request #7215 from oobabooga/dev
Merge dev branch
2025-09-02 16:51:50 -03:00
oobabooga f3829b268a llama.cpp: Always pass --flash-attn on 2025-09-02 12:12:17 -07:00
oobabooga 2395c647d4 Fix the instruct message height on mobile 2025-09-02 12:11:15 -07:00
oobabooga c6ea67bbdb Lint 2025-09-02 10:22:03 -07:00
oobabooga 00ed878b05 Slightly more robust model loading 2025-09-02 10:16:26 -07:00
oobabooga d843afcf66 Update llama.cpp 2025-09-02 05:43:33 -07:00
oobabooga 00ebb295d3 Update llama.cpp 2025-08-31 16:27:23 -07:00
oobabooga 387e249dec Change an info message 2025-08-31 16:27:10 -07:00
oobabooga 8028d88541 Lint 2025-08-30 21:29:20 -07:00
oobabooga 13876a1ee8 llama.cpp: Remove the --flash-attn flag (it's always on now) 2025-08-30 20:28:26 -07:00
oobabooga 7b80e9a2ad Update llama.cpp 2025-08-30 20:22:11 -07:00
oobabooga 5631d4e3d6 Minor change after 21d790f87e 2025-08-30 15:34:49 -07:00
oobabooga 5920ad8834 UI: Give streaming instruct messages more vertical space 2025-08-30 15:22:50 -07:00
oobabooga 21d790f87e Optimize LaTeX rendering during streaming for long replies 2025-08-30 14:52:22 -07:00
oobabooga 3a3e247f3c Even better way to handle continue for thinking blocks 2025-08-30 12:36:35 -07:00
oobabooga cf1aad2a68 Fix "continue" for Byte-OSS for partial thinking blocks 2025-08-30 12:16:45 -07:00
oobabooga 96136ea760 Fix LaTeX rendering for equations with asterisks 2025-08-30 10:13:32 -07:00
oobabooga a3eb67e466 Fix the UI failing to launch if the Notebook prompt is too long 2025-08-30 08:42:26 -07:00
oobabooga 08f90f4b64 Lint 2025-08-29 14:09:04 -07:00
oobabooga 07a2e226c1 UI: Minor font color fixes in instruct mode 2025-08-29 14:08:38 -07:00
oobabooga a2b37adb26 UI: Preload the correct fonts for chat mode 2025-08-29 09:25:44 -07:00
oobabooga 084675cf75 UI: Improve thinking blocks in chat-instruct mode 2025-08-29 09:11:10 -07:00
oobabooga d78b7d0fad Lint 2025-08-28 20:22:07 -07:00
oobabooga fc2eb48664 Style fixes after 73442a2b6d 2025-08-28 20:21:55 -07:00
oobabooga 2720955478 Fix a bug after d9eec31886 2025-08-28 19:48:16 -07:00
oobabooga d9eec31886 UI: Suppress "Attempted to select a non-interactive or hidden tab" warnings 2025-08-28 17:46:29 -07:00
oobabooga cb8780a4ce Safer check for is_multimodal when loading models
Avoids unrelated multimodal error when a model fails to load due
to lack of memory.
2025-08-28 11:13:19 -07:00
oobabooga cfc83745ec UI: Improve right sidebar borders in light mode 2025-08-28 08:34:48 -07:00
oobabooga a336a8bbeb UI: Fix italic and quote color in headings 2025-08-28 08:26:40 -07:00
oobabooga ba6041251d UI: Minor change 2025-08-28 06:20:00 -07:00
oobabooga a92758a144 llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS 2025-08-27 16:15:40 -07:00
oobabooga 030ba7bfeb UI: Mention that Seed-OSS uses enable_thinking 2025-08-27 07:44:35 -07:00
oobabooga 0b4518e61c "Text generation web UI" -> "Text Generation Web UI" 2025-08-27 05:53:09 -07:00
oobabooga 73442a2b6d UI: Better handle the chat input position with CSS
This also solves scrolling issues with the main chat content
when the height of the textarea increases.
2025-08-27 05:43:13 -07:00
oobabooga 8042f76399 Make portable installs functional with Python 3.13 2025-08-27 05:37:01 -07:00
oobabooga ccc8a2229d Revert "UI: Preserve chat scroll position on textarea resize"
This reverts commit 750adf793d.
2025-08-26 13:59:54 -07:00
oobabooga 750adf793d UI: Preserve chat scroll position on textarea resize 2025-08-26 12:19:23 -07:00
oobabooga 02ca96fa44 Multiple fixes 2025-08-25 22:17:22 -07:00
oobabooga 6a7166fffa Add support for the Seed-OSS template 2025-08-25 19:46:48 -07:00
oobabooga 8fcb4b3102 Make bot_prefix extensions functional again 2025-08-25 19:10:46 -07:00
oobabooga 8f660aefe3 Fix chat-instruct replies leaking the bot name sometimes 2025-08-25 18:50:16 -07:00
oobabooga a531328f7e Fix the GPT-OSS stopping string 2025-08-25 18:41:58 -07:00
oobabooga 6c165d2e55 Fix the chat template 2025-08-25 18:28:43 -07:00
oobabooga b657be7381 Obtain stopping strings in chat mode 2025-08-25 18:22:08 -07:00
oobabooga ded6c41cf8 Fix impersonate for chat-instruct 2025-08-25 18:16:17 -07:00
oobabooga c1aa4590ea Code simplifications, fix impersonate 2025-08-25 18:05:40 -07:00
oobabooga b330ec3517 Simplifications 2025-08-25 17:54:15 -07:00
oobabooga 3ad5970374 Make the llama.cpp --verbose output less verbose 2025-08-25 17:43:21 -07:00
oobabooga adeca8a658 Remove changes to the jinja2 templates 2025-08-25 17:36:01 -07:00
oobabooga aad0104c1b Remove a function 2025-08-25 17:33:13 -07:00
oobabooga f919cdf881 chat.py code simplifications 2025-08-25 17:20:51 -07:00
oobabooga d08800c359 chat.py improvements 2025-08-25 17:03:37 -07:00
oobabooga 3bc48014a5 chat.py code simplifications 2025-08-25 16:48:21 -07:00
oobabooga 1f77427088 Update llama.cpp 2025-08-24 19:56:22 -07:00
oobabooga 2478294c06 UI: Preload the instruct and chat fonts 2025-08-24 12:37:41 -07:00
oobabooga 8be798e15f llama.cpp: Fix stderr deadlock while loading some multimodal models 2025-08-24 12:20:05 -07:00
oobabooga 7fe8da8944 Minor simplification after f247c2ae62 2025-08-22 14:42:56 -07:00
oobabooga f247c2ae62 Make --model work with absolute paths, eg --model /tmp/gemma-3-270m-it-IQ4_NL.gguf 2025-08-22 11:47:33 -07:00
oobabooga fd41f2fafc Update llama.cpp 2025-08-22 11:18:56 -07:00
oobabooga cb00db15c9
Merge pull request #7205 from oobabooga/dev
Merge dev branch
2025-08-19 11:51:06 -03:00
oobabooga 9e7b326e34 Lint 2025-08-19 06:50:40 -07:00
oobabooga 1972479610 Add the TP option to exllamav3_HF 2025-08-19 06:48:22 -07:00
oobabooga e0f5905a97 Code formatting 2025-08-19 06:34:05 -07:00
oobabooga 5b06284a8a UI: Keep ExLlamav3_HF selected if already selected for EXL3 models 2025-08-19 06:23:21 -07:00
oobabooga cbba58bef9 UI: Fix code blocks having an extra empty line 2025-08-18 15:50:09 -07:00
oobabooga 8805a50d24 Update llama.cpp 2025-08-18 15:31:01 -07:00
oobabooga 7d23a55901 Fix model unloading when switching loaders (closes #7203) 2025-08-18 09:05:47 -07:00
oobabooga 08594e5263 Installer: Slight improvement 2025-08-18 05:59:46 -07:00
oobabooga 15f99b1b71 Installer: Fix a requirement file 2025-08-18 05:51:46 -07:00
oobabooga 6b1b2e2373 Update README 2025-08-17 22:19:20 -07:00
oobabooga 8a14aa62ff Update README 2025-08-17 22:06:59 -07:00
oobabooga 8cdb911a6e Update README 2025-08-17 22:06:12 -07:00
oobabooga 6bf31479d9 Update README 2025-08-17 22:00:21 -07:00
oobabooga 320f7339cd Update README 2025-08-17 21:56:35 -07:00
oobabooga 3dec47eaf8 Small one-click installer changes 2025-08-17 21:43:46 -07:00
oobabooga 35707c2dd8 Update README 2025-08-17 21:39:57 -07:00
oobabooga 58797a9eb5 Minor change after 9651b5c873 2025-08-17 14:18:23 -07:00
oobabooga 64eba9576c mtmd: Fix a bug when "include past attachments" is unchecked 2025-08-17 14:08:40 -07:00
oobabooga 3a91ca2dd1 Update flash attention 2025-08-17 13:57:23 -07:00
oobabooga 9651b5c873 Make CUDA 12.8 the default CUDA option, remove the CUDA 12.4 option
Exllamav3 doesn't compile with torch 2.6 anymore, and torch 2.7
requires newer CUDA.
2025-08-17 13:26:09 -07:00
oobabooga a633793a00 Bump exllamav3 to 0.0.6 2025-08-17 13:19:42 -07:00
oobabooga dbabe67e77 ExLlamaV3: Enable the --enable-tp option, add a --tp-backend option 2025-08-17 13:19:11 -07:00
oobabooga d771ca4a13 Fix web search (attempt) 2025-08-14 12:05:14 -07:00
oobabooga 73a8a737b2 docs: Improve the multimodal examples slightly 2025-08-13 18:23:18 -07:00
altoiddealer 57f6e9af5a
Set multimodal status during Model Loading (#7199) 2025-08-13 16:47:27 -03:00
oobabooga 45e2935e87
Merge pull request #7198 from oobabooga/dev
Merge dev branch
2025-08-13 10:50:09 -03:00
oobabooga 725a8bcf60 Small docs change 2025-08-13 06:49:28 -07:00
oobabooga 331eab81f7 mtmd: Explain base64 inputs in the API docs 2025-08-13 06:46:10 -07:00
oobabooga 8c9a7e1334
Merge pull request #7195 from oobabooga/dev
Merge dev branch
2025-08-12 18:20:24 -03:00
oobabooga bd05fb899e Update README 2025-08-12 14:19:18 -07:00
oobabooga 6c2fdfdbda
Merge pull request #7190 from oobabooga/dev
Merge dev branch
2025-08-12 18:14:53 -03:00
oobabooga 41b95e9ec3 Lint 2025-08-12 13:37:37 -07:00
oobabooga 2f979ce294 docs: Add a multimodal tutorial 2025-08-12 13:33:49 -07:00
oobabooga 7301452b41 UI: Minor info message change 2025-08-12 13:23:24 -07:00
oobabooga 8d7b88106a Revert "mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)"
This reverts commit d8fcc71616.
2025-08-12 13:20:16 -07:00
oobabooga 2f6a629393 UI: Minor improvement after 0e88a621fd 2025-08-12 08:51:01 -07:00
oobabooga 2238302b49 ExLlamaV3: Add speculative decoding 2025-08-12 08:50:45 -07:00
oobabooga 0882970a94 Update llama.cpp 2025-08-12 07:00:24 -07:00
oobabooga d8fcc71616 mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp) 2025-08-11 18:02:33 -07:00
oobabooga e6447cd24a mtmd: Update the llama-server request 2025-08-11 17:42:35 -07:00
oobabooga c47e6deda2 Update README 2025-08-11 16:20:20 -07:00
oobabooga 0e3def449a llama.cpp: --swa-full to llama-server when streaming-llm is checked 2025-08-11 15:17:25 -07:00
oobabooga 0e88a621fd UI: Better organize the right sidebar 2025-08-11 15:16:03 -07:00
oobabooga 1e3c4e8bdb Update llama.cpp 2025-08-11 14:40:59 -07:00
oobabooga 765af1ba17 API: Improve a validation 2025-08-11 12:39:48 -07:00
oobabooga a78ca6ffcd Remove a comment 2025-08-11 12:33:38 -07:00
oobabooga dfd9c60d80 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-08-11 12:33:27 -07:00
oobabooga 999471256c Lint 2025-08-11 12:32:17 -07:00
Mykeehu 1ba1211ca0
Fix edit window and buttons in Messenger theme (#7100) 2025-08-11 16:13:56 -03:00
oobabooga b10d525bf7 UI: Update a tooltip 2025-08-11 12:05:22 -07:00
oobabooga b62c8845f3 mtmd: Fix /chat/completions for llama.cpp 2025-08-11 12:01:59 -07:00
oobabooga 38c0b4a1ad Default ctx-size to 8192 when not found in the metadata 2025-08-11 07:39:53 -07:00
oobabooga 52d1cbbbe9 Fix an import 2025-08-11 07:38:39 -07:00
oobabooga 1cb800d392 Docs: small change 2025-08-11 07:37:10 -07:00
oobabooga 4809ddfeb8 Exllamav3: small sampler fixes 2025-08-11 07:35:22 -07:00
oobabooga 4d8dbbab64 API: Fix sampler_priority usage for ExLlamaV3 2025-08-11 07:26:11 -07:00
oobabooga c5340533c0 mtmd: Add another API example 2025-08-10 20:39:04 -07:00
oobabooga 9ec310d858 UI: Fix the color of italic text 2025-08-10 07:54:21 -07:00
oobabooga cc964ee579 mtmd: Increase the size of the UI image preview 2025-08-10 07:44:38 -07:00
oobabooga 6fbf162d71 Default max_tokens to 512 in the API instead of 16 2025-08-10 07:21:55 -07:00
oobabooga 1fb5807859 mtmd: Fix API text completion when no images are sent 2025-08-10 06:54:44 -07:00
oobabooga 0ea62d88f6 mtmd: Fix "continue" when an image is present 2025-08-09 21:47:02 -07:00
oobabooga 4663b1a56e Update docs 2025-08-09 21:45:50 -07:00
oobabooga 2f90ac9880 Move the new image_utils.py file to modules/ 2025-08-09 21:41:38 -07:00
oobabooga c6b4d1e87f Fix the exllamav2 loader ignoring add_bos 2025-08-09 21:34:35 -07:00
oobabooga d86b0ec010
Add multimodal support (llama.cpp) (#7027) 2025-08-10 01:27:25 -03:00
oobabooga eb16f64017 Update llama.cpp 2025-08-09 17:12:16 -07:00
oobabooga a289a92b94 Fix exllamav3 token count 2025-08-09 17:10:58 -07:00
oobabooga d489eb589a Attempt at fixing new exllamav3 loader undefined behavior when switching conversations 2025-08-09 14:11:31 -07:00
oobabooga a6d6bee88c Change a comment 2025-08-09 07:51:03 -07:00
oobabooga 2fe79a93cc mtmd: Handle another case after 3f5ec9644f 2025-08-09 07:50:24 -07:00
oobabooga 59c6138e98 Remove a log message 2025-08-09 07:32:15 -07:00
oobabooga f396b82a4f mtmd: Better way to detect if an EXL3 model is multimodal 2025-08-09 07:31:36 -07:00
oobabooga fa9be444fa Use ExLlamav3 instead of ExLlamav3_HF by default for EXL3 models 2025-08-09 07:26:59 -07:00
oobabooga d9db8f63a7 mtmd: Simplifications 2025-08-09 07:25:42 -07:00
oobabooga 3f5ec9644f mtmd: Place the image <__media__> at the top of the prompt 2025-08-09 07:06:07 -07:00
oobabooga 1168004067 Minor change 2025-08-09 07:01:55 -07:00
oobabooga 9e260332cc Remove some unnecessary code 2025-08-08 21:22:47 -07:00
oobabooga 544c3a7c9f Polish the new exllamav3 loader 2025-08-08 21:15:53 -07:00
oobabooga 8fcadff8d3 mtmd: Use the base64 attachment for the UI preview instead of the file 2025-08-08 20:13:54 -07:00
oobabooga 6e9de75727 Support loading chat templates from chat_template.json files 2025-08-08 19:35:09 -07:00
Katehuuh 88127f46c1
Add multimodal support (ExLlamaV3) (#7174) 2025-08-08 23:31:16 -03:00
oobabooga b391ac8eb1 Fix getting the ctx-size for EXL3/EXL2/Transformers models 2025-08-08 18:11:45 -07:00
oobabooga 88ba4b1ebf
Merge pull request #7181 from oobabooga/dev
Merge dev branch
2025-08-07 00:30:46 -03:00
oobabooga f1147c9926 Update llama.cpp 2025-08-06 19:32:36 -07:00
oobabooga 3e24f455c8 Fix continue for GPT-OSS (hopefully the final fix) 2025-08-06 10:18:42 -07:00
oobabooga 0c1403f2c7 Handle GPT-OSS as a special case when continuing 2025-08-06 08:05:37 -07:00
oobabooga 6ce4b353c4 Fix the GPT-OSS template 2025-08-06 07:12:39 -07:00
oobabooga fefdb20f69
Merge pull request #7180 from oobabooga/dev
Merge dev branch
2025-08-05 23:54:32 -03:00
oobabooga 7c82d65a9d Handle GPT-OSS as a special template case 2025-08-05 18:05:09 -07:00
oobabooga fbea21a1f1 Only use enable_thinking if the template supports it 2025-08-05 17:33:27 -07:00
oobabooga bfbbfc2361 Ignore add_generation_prompt in GPT-OSS 2025-08-05 17:33:01 -07:00
oobabooga 20adc3c967 Start over new template handling (to avoid overcomplicating) 2025-08-05 16:58:45 -07:00
oobabooga 80f6abb07e Begin fixing 'Continue' with GPT-OSS 2025-08-05 16:01:19 -07:00
oobabooga e5b8d4d072 Fix a typo 2025-08-05 15:52:56 -07:00
oobabooga 701048cf33 Try to avoid breaking jinja2 parsing for older models 2025-08-05 15:51:24 -07:00
oobabooga 7d98ca6195 Make web search functional with thinking models 2025-08-05 15:44:33 -07:00
oobabooga 0e42575c57 Fix thinking block parsing for GPT-OSS under llama.cpp 2025-08-05 15:36:20 -07:00
oobabooga 498778b8ac Add a new 'Reasoning effort' UI element 2025-08-05 15:19:11 -07:00
oobabooga 6bb8212731 Fix thinking block rendering for GPT-OSS 2025-08-05 15:06:22 -07:00
oobabooga 42e3a7a5ae Update llama.cpp 2025-08-05 14:56:12 -07:00
oobabooga 5c5a4dfc14 Fix impersonate 2025-08-05 13:04:10 -07:00
oobabooga ecd16d6bf9 Automatically set skip_special_tokens to False for channel-based templates 2025-08-05 12:57:49 -07:00
oobabooga 178c3e75cc Handle templates with channels separately 2025-08-05 12:52:17 -07:00
oobabooga 9f28f53cfc Better parsing of the gpt-oss template 2025-08-05 11:56:00 -07:00
oobabooga 3b28dc1821 Don't pass torch_dtype to transformers loader, let it be autodetected 2025-08-05 11:35:53 -07:00
oobabooga 3039aeffeb Fix parsing the gpt-oss-20b template 2025-08-05 11:35:17 -07:00
oobabooga 5989043537 Transformers: Support standalone .jinja chat templates (for GPT-OSS) 2025-08-05 11:22:18 -07:00
oobabooga 02a3420a50 Bump transformers to 4.55 (adds gpt-oss support) 2025-08-05 10:09:30 -07:00
oobabooga 74230f559a Bump transformers to 4.54 2025-08-01 11:03:15 -07:00
oobabooga f08bb9a201 Handle edge case in chat history loading (closes #7155) 2025-07-24 10:34:59 -07:00
oobabooga d746484521 Handle both int and str types in grammar char processing 2025-07-23 11:52:51 -07:00
oobabooga 714f745713
Merge pull request #7141 from oobabooga/dev
Merge dev branch
2025-07-19 17:54:06 -03:00
oobabooga 0c667de7a7 UI: Add a None option for the speculative decoding model (closes #7145) 2025-07-19 12:14:41 -07:00
oobabooga ccf5e3e3a7 Update exllamav3 2025-07-19 12:07:38 -07:00
oobabooga a00983b2ba Update llama.cpp 2025-07-19 12:07:20 -07:00
oobabooga 9371867238 Update exllamav2 2025-07-15 07:38:03 -07:00
oobabooga 03fb85e49a Update llama.cpp 2025-07-15 07:37:13 -07:00
oobabooga 845432b9b4 Remove the obsolete modules/relative_imports.py file 2025-07-14 21:03:18 -07:00
oobabooga 1d1b20bd77 Remove the --torch-compile option (it doesn't do anything currently) 2025-07-11 10:51:23 -07:00
oobabooga 5a8a9c22e8 Update llama.cpp 2025-07-11 09:20:27 -07:00
oobabooga 273888f218 Revert "Use eager attention by default instead of sdpa"
This reverts commit bd4881c4dc.
2025-07-10 18:56:46 -07:00
oobabooga caf69d871a Revert "Standardize margins and paddings across all chat styles"
This reverts commit 86cb5e0587.
2025-07-10 18:43:01 -07:00
oobabooga 188c7c8f2b Revert "CSS simplifications"
This reverts commit c6c1b725e9.
2025-07-10 18:42:52 -07:00
oobabooga 635e6efd18 Ignore add_bos_token in instruct prompts, let the jinja2 template decide 2025-07-10 07:14:01 -07:00
oobabooga 0f3a88057c Don't downgrade triton-windows on CUDA 12.8 2025-07-10 05:39:04 -07:00
oobabooga e523f25b9f Downgrade triton-windows to 3.2.0.post19
https://github.com/oobabooga/text-generation-webui/issues/7107#issuecomment-3057250374
2025-07-10 05:35:57 -07:00
oobabooga a7a3a0c700 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-07-09 21:07:42 -07:00
oobabooga 21e0e9f32b Add the triton-windows requirement on Windows to make transformers functional 2025-07-09 21:05:17 -07:00
dependabot[bot] d1f4622a96
Update peft requirement from ==0.15.* to ==0.16.* in /requirements/full (#7127) 2025-07-10 00:15:50 -03:00
oobabooga e015355e4a Update README 2025-07-09 20:03:53 -07:00
oobabooga bd4881c4dc Use eager attention by default instead of sdpa 2025-07-09 19:57:37 -07:00
oobabooga b69f435311 Fix latest transformers being super slow 2025-07-09 19:56:50 -07:00
oobabooga 8b3c7aa795 Bump bitsandbytes to 0.46 2025-07-09 19:46:55 -07:00
oobabooga f045b72826 Bump accelerate to 1.8 2025-07-09 19:46:26 -07:00
oobabooga c357601c01 Bump transformers to 4.53 2025-07-09 18:48:04 -07:00
oobabooga 6c2bdda0f0 Transformers loader: replace use_flash_attention_2/use_eager_attention with a unified attn_implementation
Closes #7107
2025-07-09 18:39:37 -07:00
oobabooga 6338dc0051
Merge pull request #7129 from oobabooga/dev
Merge dev branch
2025-07-09 00:10:16 -03:00
oobabooga 511bb31646 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-07-08 20:04:37 -07:00
oobabooga d1e9301a43 Remove fragile js from 9a58964834 2025-07-08 19:57:46 -07:00
Cats cd5d867b62
docs: Add Mirostat Explanation (#7128) 2025-07-08 17:54:38 -03:00
oobabooga 3e24a127c7 Remove more unnecessary files from portable builds 2025-07-08 09:13:11 -07:00
oobabooga 2f544fe199 Update the keyboard shortcuts documentation 2025-07-08 09:02:42 -07:00
oobabooga 93e08c0d4a Update README 2025-07-08 08:59:29 -07:00
oobabooga 42191a36ab Keep navigation icons visible when switching versions 2025-07-08 07:10:04 -07:00
oobabooga c6c1b725e9 CSS simplifications 2025-07-07 21:11:13 -07:00
oobabooga 86cb5e0587 Standardize margins and paddings across all chat styles 2025-07-07 21:02:19 -07:00
oobabooga b7d5982944
Merge pull request #7125 from oobabooga/dev
Merge dev branch
2025-07-07 18:19:58 -03:00
oobabooga e8266b0356 Use windows-2022 in workflows 2025-07-07 14:19:20 -07:00
oobabooga e1034fc79e
Merge pull request #7124 from oobabooga/dev
Merge dev branch
2025-07-07 18:13:30 -03:00
oobabooga 74d98186fc Slightly more robust autoscroll 2025-07-07 13:23:23 -07:00
oobabooga ca226a54c6 Disable the message version navigation hover effects during streaming 2025-07-07 11:29:37 -07:00
oobabooga 07e6f004c5 Rename a button in the Session tab for clarity 2025-07-07 11:28:47 -07:00
oobabooga 426e7a4cec Update the extensions documentation 2025-07-07 08:43:01 -07:00
oobabooga e52bc0acb2 Update llama.cpp 2025-07-06 20:28:35 -07:00
oobabooga cbef2720ce Revert "Fix: use embedded Python in start_windows.bat to avoid system interpreter conflicts (#7120)"
This reverts commit 8df1127ce2.
2025-07-06 20:14:02 -07:00
Alidr79 e5767d4fc5
Update ui_model_menu.py blocking the --multi-user access in backend (#7098) 2025-07-06 21:48:53 -03:00
oobabooga 60123a67ac Better log message when extension requirements are not found 2025-07-06 17:44:41 -07:00
oobabooga e6bc7742fb Support installing user extensions in user_data/extensions/ 2025-07-06 17:30:23 -07:00
Philipp Claßen 959d4ddb91
Fix for chat sidebars toggle buttons disappearing (#7106) 2025-07-06 20:51:42 -03:00
Ali 8df1127ce2
Fix: use embedded Python in start_windows.bat to avoid system interpreter conflicts (#7120) 2025-07-06 20:42:34 -03:00
oobabooga de4ccffff8 Fix the duckduckgo search 2025-07-06 16:24:57 -07:00
oobabooga 0f258774d3 Minor README changes 2025-07-05 14:25:59 -07:00
oobabooga 4583924ce7 Remove torchvision/torchaudio mentions from the README 2025-07-05 14:24:15 -07:00
oobabooga c4d738f39f Update llama.cpp 2025-07-05 14:09:29 -07:00
oobabooga c4d5331c03 Fix autoscroll after fonts load 2025-07-04 13:21:52 -07:00
oobabooga 92ec8dda03 Fix chat history getting lost if the UI is inactive for a long time (closes #7109) 2025-07-04 06:04:04 -07:00
oobabooga 23bb94a5fb Update llama.cpp 2025-07-03 20:36:54 -07:00
zombiegreedo 877c651c04
Handle either missing <think> start or </think> end tags (#7102) 2025-07-03 23:05:46 -03:00
oobabooga cbba88f565 Fix scrolling during streaming when thinking blocks are present 2025-07-03 18:16:29 -07:00
oobabooga 13373391df Rename miniconda -> miniforge everywhere 2025-07-03 14:13:22 -07:00
oobabooga ab162f976c Use miniforge instead of miniconda to avoid anaconda licensing issues 2025-07-03 11:31:52 -07:00
oobabooga 9a58964834 Keep the last message visible when the input height changes 2025-06-22 20:44:04 -07:00
oobabooga c3faecfd27 Minor change 2025-06-22 17:51:09 -07:00
oobabooga 1b19dd77a4 Move 'Enable thinking' to the Chat tab 2025-06-22 17:29:17 -07:00
oobabooga 02f604479d Remove the pre-jinja2 custom stopping string handling (closes #7094) 2025-06-21 14:03:35 -07:00
oobabooga 58282f7107 Replace 'Generate' with 'Send' in the Chat tab 2025-06-20 06:59:48 -07:00
oobabooga bb97ca1b22 Fix a small issue with the chat input 2025-06-19 21:41:41 -07:00
oobabooga f154aeafea Optimize chat scrolling for the 40th time, hopefully the last one 2025-06-19 21:23:10 -07:00
oobabooga 17f9c188bd
Merge pull request #7092 from oobabooga/dev
Merge dev branch
2025-06-19 19:42:16 -03:00
oobabooga acd57b6a85 Minor UI change 2025-06-19 15:39:43 -07:00
oobabooga f08db63fbc Change some comments 2025-06-19 15:26:45 -07:00
oobabooga 2517ea9c9e Lint 2025-06-19 15:23:06 -07:00
oobabooga 90f42f311a Update README 2025-06-19 12:43:05 -07:00
oobabooga ee945517ff Update README 2025-06-19 12:39:53 -07:00
oobabooga a1b606a6ac Fix obtaining the maximum number of GPU layers for DeepSeek-R1-0528-GGUF 2025-06-19 12:30:57 -07:00
oobabooga 3344510553 Force dark theme on the Gradio login page 2025-06-19 12:11:34 -07:00
oobabooga 645463b9f0 Add fallback values for theme colors 2025-06-19 11:28:12 -07:00
oobabooga 09cd1cb4e2 Update README 2025-06-19 10:51:45 -07:00
oobabooga c4029914e8 Update README 2025-06-19 10:48:33 -07:00
oobabooga 84617abdeb Properly fix the /v1/models endpoint 2025-06-19 10:25:55 -07:00
oobabooga 93cd47c948 Bump numpy to 2.2 (loses #7090) 2025-06-19 08:00:30 -07:00
oobabooga dcdc42fa06 Fix the /v1/models output format (closes #7089) 2025-06-19 07:57:17 -07:00
oobabooga 9c6913ad61 Show file sizes on "Get file list" 2025-06-18 21:35:07 -07:00
oobabooga 9bd114b5d7 Merge branch 'main' into dev 2025-06-18 21:03:52 -07:00
oobabooga 76a722dc90 Remove .github and .gitignore folders from portable builds 2025-06-18 21:03:45 -07:00
oobabooga 4e0dfbdde3 Remove .github and .gitignore folders from portable builds 2025-06-18 21:02:57 -07:00
oobabooga 92547becff
Merge pull request #7085 from oobabooga/dev
Merge dev branch
2025-06-18 22:43:07 -03:00
oobabooga 0cb82483ef Lint 2025-06-18 18:26:59 -07:00
oobabooga e33921a629 Fix jittering while typing on Firefox (closes #7086) 2025-06-18 17:54:34 -07:00
oobabooga 6af3598cfa API: Remove obsolete list_dummy_models function 2025-06-18 16:15:42 -07:00
NoxWorld2660 0b26650f47
Expose real model list via /v1/models endpoint (#7088) 2025-06-18 20:14:24 -03:00
oobabooga 6cc7bbf009 Better autosave behavior for notebook tab when there are 2 columns 2025-06-18 15:54:32 -07:00
oobabooga 197b327374 Minor log message change 2025-06-18 13:36:54 -07:00
oobabooga 2f45d75309 Increase the area of the notebook textbox 2025-06-18 13:22:06 -07:00
oobabooga 7cb2b1bfdb Fix some events 2025-06-18 10:27:38 -07:00
oobabooga 8b7eb5c87c Code simplification 2025-06-18 10:22:36 -07:00
oobabooga 22cc9e0115 Remove 'Send to Default' 2025-06-18 10:21:48 -07:00
oobabooga 678f40297b Clear the default tab output when switching prompts 2025-06-17 17:40:48 -07:00
oobabooga a2cdd06afc Revert "Workaround for jittering while typing on firefox"
This reverts commit b4edfce993.
2025-06-17 15:29:40 -07:00
oobabooga 2d37602382 Small improvements to wpp style 2025-06-17 15:26:59 -07:00
oobabooga da148232eb Better filenames for new prompts in the Notebook tab 2025-06-17 15:10:44 -07:00
oobabooga fc23345c6d Send the default input to the notebook textbox when switching 2 columns to 1 (instead of the output) 2025-06-17 15:03:14 -07:00
oobabooga 75217d3713 Change issue template 2025-06-17 09:37:24 -07:00
oobabooga b4edfce993 Workaround for jittering while typing on firefox 2025-06-17 09:30:03 -07:00
oobabooga 01ef4c61bd Only open/close both sidebars at the same time on desktop 2025-06-17 08:45:11 -07:00
oobabooga 315e06f695 Update llama.cpp 2025-06-17 07:51:16 -07:00
oobabooga 73138a29fa Small change 2025-06-17 07:49:24 -07:00
oobabooga 87ae09ecd6 Improve the basic API examples 2025-06-17 07:46:58 -07:00
oobabooga aa44e542cb Revert "Safer usage of mkdir across the project"
This reverts commit 0d1597616f.
2025-06-17 07:11:59 -07:00
oobabooga 0d1597616f Safer usage of mkdir across the project 2025-06-17 07:09:33 -07:00
oobabooga 8689d7ecea Update README 2025-06-16 21:21:39 -07:00
oobabooga 8f49e6144e Update README 2025-06-16 21:09:45 -07:00
oobabooga 66e991841a Fix the character pfp not appearing when switching from instruct to chat modes 2025-06-16 18:45:44 -07:00
oobabooga be3d371290 Close the big profile picture when switching to instruct mode 2025-06-16 18:42:17 -07:00
oobabooga 26eda537f0 Add auto-save for notebook textbox while typing 2025-06-16 17:48:23 -07:00
oobabooga 88c0204357 Disable start_with when generating the websearch query 2025-06-16 14:53:05 -07:00
oobabooga 97a539cab6 Minor style change 2025-06-16 13:55:45 -07:00
oobabooga faae4dc1b0
Autosave generated text in the Notebook tab (#7079) 2025-06-16 17:36:05 -03:00
oobabooga d0befe0729 Add a comment 2025-06-16 09:22:22 -07:00
oobabooga de24b3bb31
Merge the Default and Notebook tabs into a single Notebook tab (#7078) 2025-06-16 13:19:29 -03:00
oobabooga db67d69ddc Lint 2025-06-16 07:28:14 -07:00
oobabooga cac225b589 Small style improvements 2025-06-16 07:26:39 -07:00
oobabooga 7ba3d4425f Remove the 'Send to negative prompt' button 2025-06-16 07:23:09 -07:00
oobabooga 34bf93ef47 Move 'Custom system message' to the Parameters tab 2025-06-16 07:22:14 -07:00
oobabooga c9c3b716fb Move character settings to a new 'Character' main tab 2025-06-16 07:21:25 -07:00
oobabooga f77f1504f5 Improve the style of the Character and User tabs 2025-06-16 06:12:37 -07:00
oobabooga 949b7ec9cf Further optimize scrolling in the chat tab 2025-06-15 18:50:21 -07:00
oobabooga d347b056e3 Always close/open the two sidebars at the same time 2025-06-15 18:12:11 -07:00
oobabooga 9bcef8a648 Fix "show controls" conflicting with manually hiding the sidebars 2025-06-15 17:57:41 -07:00
oobabooga bc2b0f54e9 Only save extensions settings on manual save 2025-06-15 15:53:16 -07:00
oobabooga cc757f6226 Small style improvements to the chat tab 2025-06-15 08:32:06 -07:00
oobabooga b279460a81 Improve the wpp style 2025-06-15 08:25:07 -07:00
oobabooga e8dc7b0ee9 Bump exllamav3 to 0.0.4 2025-06-15 08:15:29 -07:00
oobabooga 4fc254c1dd Optimize syntax highlighting on long conversations 2025-06-15 08:13:13 -07:00
oobabooga 609c3ac893 Optimize the end of generation with llama.cpp 2025-06-15 08:03:27 -07:00
oobabooga db7d717df7 Remove images and links from websearch results
This reduces noise a lot
2025-06-14 20:00:25 -07:00
oobabooga e263dbf852 Improve user input truncation 2025-06-14 19:43:51 -07:00
oobabooga 09606a38d3 Truncate web search results to at most 8192 tokens 2025-06-14 19:37:32 -07:00
oobabooga ad0be25c46 Update llama.cpp 2025-06-14 15:00:14 -07:00
oobabooga 7c0225931a Merge branch 'main' into dev 2025-06-14 14:59:37 -07:00
oobabooga 1c1cf09a59 Update workflows 2025-06-14 14:52:49 -07:00
oobabooga 58c3b549ba Merge branch 'main' into dev 2025-06-14 10:16:13 -07:00
oobabooga 8e9c0287aa UI: Fix edge case where gpu-layers slider maximum is incorrectly limited 2025-06-14 10:12:11 -07:00
oobabooga 8e0ef5b419 Hide the header bar on Ctrl+S 2025-06-14 09:09:46 -07:00
oobabooga 1d23159837 Increase the size of the enlarged character profile picture 2025-06-14 08:45:59 -07:00
oobabooga d2da40b0e4 Remember the last selected chat for each mode/character 2025-06-14 08:25:00 -07:00
oobabooga 879fa3d8c4 Improve the wpp style & simplify the code 2025-06-14 07:14:22 -07:00
oobabooga 09eb326486 Merge README.md changes from dev branch 2025-06-13 07:46:43 -07:00
oobabooga dfab11f0b5 Update README 2025-06-13 07:45:42 -07:00
oobabooga 9a2353f97b Better log message when the user input gets truncated 2025-06-13 05:44:02 -07:00
oobabooga 322cd28e24 Update README 2025-06-13 01:27:33 -07:00
oobabooga 7cb650237c Update the README 2025-06-13 01:12:52 -07:00
oobabooga aab28398ef Update README 2025-06-13 01:06:44 -07:00
oobabooga 5ba52967ac Update README 2025-06-13 01:04:41 -07:00
oobabooga b58e80cb99 Update README 2025-06-13 01:02:11 -07:00
Miriam f4f621b215
ensure estimated vram is updated when switching between different models (#7071) 2025-06-13 02:56:33 -03:00
oobabooga f337767f36 Add error handling for non-llama.cpp models in portable mode 2025-06-12 22:17:39 -07:00
oobabooga a25a1fc8d0 Disable message action icons during streaming for better performance 2025-06-12 22:01:02 -07:00
oobabooga 2dee3a66ff Add an option to include/exclude attachments from previous messages in the chat prompt 2025-06-12 21:37:18 -07:00
oobabooga 2cfb77d16f
Merge pull request #7070 from oobabooga/dev
Merge dev branch
2025-06-12 12:38:47 -03:00
oobabooga b4d2a00e20 Update README 2025-06-12 08:35:33 -07:00
oobabooga 9ff5961853
Merge pull request #7067 from oobabooga/dev
Merge dev branch
2025-06-11 11:58:52 -03:00
oobabooga 9d6a7f1bcf Minor changes 2025-06-11 07:55:35 -07:00
oobabooga 004fd8316c Minor changes 2025-06-11 07:49:51 -07:00
oobabooga 570d5b8936 Only save extensions on manual save 2025-06-11 07:39:49 -07:00
oobabooga 27140f3563 Revert "Don't save active extensions through the UI"
This reverts commit df98f4b331.
2025-06-11 07:25:27 -07:00
oobabooga 2ebc8ff252
Merge pull request #7065 from oobabooga/dev
Merge dev branch
2025-06-11 01:09:06 -03:00
oobabooga 13a5288d01 Fix an error when upgrading from cuda 12.4 to cuda 12.8 2025-06-10 21:08:18 -07:00
oobabooga 801db438b0 Undo changes to portable builds 2025-06-10 19:55:40 -07:00
oobabooga 00fbbd6f57 Undo changes to portable builds 2025-06-10 19:54:42 -07:00
oobabooga e8041069e2
Merge pull request #7064 from oobabooga/dev
Merge dev branch
2025-06-10 23:43:10 -03:00
oobabooga fe0685a742 New attempt 2025-06-10 19:42:22 -07:00
oobabooga 036976aeb8
Merge pull request #7063 from oobabooga/dev
Merge dev branch
2025-06-10 23:35:22 -03:00
oobabooga 43fc170224 Fix the Windows workflow 2025-06-10 19:34:41 -07:00
oobabooga e9a433832e
Merge pull request #7062 from oobabooga/dev
Merge dev branch
2025-06-10 23:26:21 -03:00
oobabooga a86a5a026e Fix the GitHub Actions workflows 2025-06-10 19:25:22 -07:00
oobabooga 1e96dcf369
Merge pull request #7057 from oobabooga/dev
Merge dev branch
2025-06-10 23:08:44 -03:00
oobabooga 552cb09f09 Do not bump Transformers to 4.52 on CUDA 12.8
Performance is slow, and the older version works fine with torch 2.7.
2025-06-10 18:45:42 -07:00
LawnMauer bc921c66e5
Load js and css sources in UTF-8 (#7059) 2025-06-10 22:16:50 -03:00
oobabooga 4cf39120fc Fix chat area sometimes not scrolling up to edit message 2025-06-10 18:03:00 -07:00
oobabooga 75da90190f Fix character dropdown sometimes disappearing in the Parameters tab 2025-06-10 17:34:54 -07:00
oobabooga 1c1fd3be46 Remove some log messages 2025-06-10 14:29:28 -07:00
oobabooga 3f9eb3aad1 Fix the preset dropdown when the default preset file is not present 2025-06-10 14:22:37 -07:00
oobabooga 18bd78f1f0 Make the llama.cpp prompt processing messages shorter 2025-06-10 14:03:25 -07:00
oobabooga 889153952f Lint 2025-06-10 09:02:52 -07:00
oobabooga 2dabdbc7da Update llama.cpp 2025-06-10 05:25:23 -07:00
oobabooga c92eba0b0a Reorganize the Parameters tab (left: preset parameters, right: everything else) 2025-06-09 22:05:20 -07:00
oobabooga efd9c9707b Fix random seeds being saved to settings.yaml 2025-06-09 20:57:25 -07:00
oobabooga df98f4b331 Don't save active extensions through the UI
Prevents command-line activated extensions from becoming permanently active due to autosave.
2025-06-09 20:28:16 -07:00
Mykeehu ec73121020
Fix continue/start reply with when using translation extensions (#6944)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2025-06-10 00:17:05 -03:00
Miriam 331d03c33f
fix failure when --nowebui called without --api (#7055) 2025-06-09 23:25:39 -03:00
Miriam 1443612e72
check .attention.head_count if .attention.head_count_kv doesn't exist (#7048) 2025-06-09 23:22:01 -03:00
oobabooga d085dc6a93 Minor optimization after e976a5ddc7 2025-06-09 18:40:54 -07:00
oobabooga 263b5d5557 Use html2text to extract the text of web searches without losing formatting 2025-06-09 17:55:26 -07:00
oobabooga f5a5d0c0cb Add the URL of web attachments to the prompt 2025-06-09 17:32:25 -07:00
oobabooga 747a4a0e56 Reposition the ... typing dots 2025-06-09 13:41:29 -07:00
oobabooga 14efd42084 Improve scroll performance by disabling hover effects during scroll 2025-06-09 11:43:15 -07:00
oobabooga 1602ac1c8f Improve the style of thinking blocks in dark mode 2025-06-09 09:03:39 -07:00
oobabooga eefbf96f6a Don't save truncation_length to user_data/settings.yaml 2025-06-08 22:14:56 -07:00
oobabooga 80637cae28 Add version to portable build folder names 2025-06-08 21:55:49 -07:00
oobabooga f9a007c6a8 Properly filter out failed web search downloads from attachments 2025-06-08 19:25:23 -07:00
oobabooga f3388c2ab4 Fix selecting next chat when deleting with active search 2025-06-08 18:53:04 -07:00
oobabooga 4a369e070a Add buttons for easily deleting past chats 2025-06-08 18:47:48 -07:00
oobabooga 0b8d2d65a2 Minor style improvement 2025-06-08 18:11:27 -07:00
oobabooga 06dfb7e772 Improve the style of the hover menu 2025-06-08 18:03:07 -07:00
oobabooga b5e021fc49 Make the dark theme darker 2025-06-08 17:44:04 -07:00
oobabooga e976a5ddc7 Re-highlight code blocks when switching light/dark themes 2025-06-08 17:35:36 -07:00
oobabooga 7ed1926ce7 Small change after previous commit 2025-06-08 15:38:40 -07:00
oobabooga ff01bcb870 Use user_data/cache/gradio for Gradio temp files 2025-06-08 15:33:05 -07:00
oobabooga f81b1540ca Small style improvements 2025-06-08 15:19:25 -07:00
oobabooga eb0ab9db1d Fix light/dark theme persistence across page reloads 2025-06-08 15:04:05 -07:00
oobabooga 78899244d5 Remove settings-template.yaml 2025-06-08 09:40:09 -07:00
oobabooga 1f1435997a Don't show the new 'Restore character' button in the Chat tab 2025-06-08 09:37:54 -07:00
oobabooga 84f66484c5 Make it optional to paste long pasted content to an attachment 2025-06-08 09:31:38 -07:00
oobabooga 42e7864d62 Reorganize the Session tab 2025-06-08 09:21:23 -07:00
oobabooga af6bb7513a Add back the "Save UI defaults" button
It's useful for saving extensions settings.
2025-06-08 09:09:36 -07:00
oobabooga 1cab149c1a Remove the contrastive search preset 2025-06-07 22:26:13 -07:00
oobabooga ae150fa24f Remove the null preset 2025-06-07 22:25:46 -07:00
oobabooga 1bdf11b511 Use the Qwen3 - Thinking preset by default 2025-06-07 22:23:09 -07:00
oobabooga 0dbc4cbc71 Add Qwen3 presets 2025-06-07 22:20:58 -07:00
oobabooga fe955cac1f Small UI changes 2025-06-07 22:15:19 -07:00
oobabooga caf9fca5f3 Avoid some code repetition 2025-06-07 22:11:35 -07:00
oobabooga 3650a6fd1f Small UI changes 2025-06-07 22:02:34 -07:00
oobabooga 6436bf1920
More UI persistence: presets and characters (#7051) 2025-06-08 01:58:02 -03:00
oobabooga 35ed55d18f
UI persistence (#7050) 2025-06-07 22:46:52 -03:00
rakha abadi susilo db847eed4c
Add RTX 50XX Nvidia blackwell support (ExLlamaV2/V3 and Transformers) (#7011)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2025-06-07 21:44:15 -03:00
oobabooga 2d263f227d Fix the chat input reappearing when the page is reloaded 2025-06-06 22:38:20 -07:00
oobabooga 379dd01ca7 Filter out failed web search downloads from attachments 2025-06-06 22:32:07 -07:00
oobabooga f8f23b5489 Simplify the llama.cpp stderr filter code 2025-06-06 22:25:13 -07:00
oobabooga 45f823ddf6 Print \n after the llama.cpp progress bar reaches 1.0 2025-06-06 22:23:34 -07:00
oobabooga d47c8eb956 Remove quotes from LLM-generated websearch query (closes #7045).
Fix by @Quiet-Joker
2025-06-05 06:57:59 -07:00
oobabooga 977ec801b7 Improve table colors in instruct mode 2025-06-05 06:33:45 -07:00
Hanusz Leszek 3829507d0f
Stop model during graceful shutdown (#7042) 2025-06-04 15:13:36 -03:00
oobabooga 3d676cd50f Optimize syntax highlighting 2025-06-04 11:02:04 -07:00
oobabooga 66a75c899a Improve the scrollbars in code blocks 2025-06-04 10:59:43 -07:00
oobabooga 9bd7359ffa Scroll the textarea into view when editing a message 2025-06-04 10:47:14 -07:00
oobabooga 93b3752cdf Revert "Remove the "Is typing..." yield by default"
This reverts commit b30a73016d.
2025-06-04 09:40:30 -07:00
oobabooga b38ec0ec38 Update llama.cpp 2025-06-02 11:33:17 -07:00
oobabooga b30a73016d Remove the "Is typing..." yield by default 2025-06-02 07:49:22 -07:00
oobabooga 7278548cd1
Simplify the one-click installer (#7039) 2025-06-02 09:57:55 -03:00
oobabooga bb409c926e
Update only the last message during streaming + add back dynamic UI update speed (#7038) 2025-06-02 09:50:17 -03:00
oobabooga 45c9ae312c Use the flash-attention wheels in https://github.com/kingbri1/flash-attention 2025-06-01 22:17:22 -07:00
oobabooga 2db7745cbd Show llama.cpp prompt processing on one line instead of many lines 2025-06-01 22:12:24 -07:00
oobabooga ad6d0218ae Fix after 219f0a7731 2025-06-01 19:27:14 -07:00
oobabooga 92adceb7b5 UI: Fix the model downloader progress bar 2025-06-01 19:22:21 -07:00
oobabooga 7a81beb0c1 Turn long pasted text into an attachment automatically 2025-06-01 18:26:14 -07:00
oobabooga bf42b2c3a1 Fix thinking blocks sometimes showing a white outline 2025-06-01 11:02:04 -07:00
oobabooga 83849336d8 Improve how Show controls looks in the hover menu 2025-06-01 10:58:49 -07:00
oobabooga 3e3746283c Improve the typing dots position 2025-06-01 10:55:31 -07:00
oobabooga 88ff3e6ad8 CSS fixes after 98a7508a99 2025-06-01 08:04:35 -07:00
oobabooga 9e80193008 Add the model name to each message's metadata 2025-05-31 22:41:35 -07:00
oobabooga 0816ecedb7 Lint 2025-05-31 22:25:09 -07:00
oobabooga 98a7508a99 UI: Move 'Show controls' inside the hover menu 2025-05-31 22:22:13 -07:00
oobabooga 85f2f01a3a UI: Fix extra gaps on the right sidebar 2025-05-31 21:29:57 -07:00
oobabooga f8d220c1e6 Add a tooltip to the web search checkbox 2025-05-31 21:22:36 -07:00
oobabooga 4a2727b71d Add a tooltip to the file upload button 2025-05-31 20:24:31 -07:00
oobabooga 1d88456659 Add support for .docx attachments 2025-05-31 20:15:07 -07:00
oobabooga dc8ed6dbe7 Bump exllamav3 to 0.0.3 2025-05-31 14:27:33 -07:00
oobabooga c55d3c61c6 Bump exllamav2 to 0.3.1 2025-05-31 14:21:42 -07:00
oobabooga ae61c1a0f4
Merge pull request #7034 from oobabooga/dev
Merge dev branch
2025-05-30 23:07:56 -03:00
oobabooga 15f466ca3f Update README 2025-05-30 15:49:57 -07:00
oobabooga 219f0a7731 Fix exllamav3_hf models failing to unload (closes #7031) 2025-05-30 12:05:49 -07:00
oobabooga 298d4719c6 Multiple small style improvements 2025-05-30 11:32:24 -07:00
oobabooga 7c29879e79 Fix 'Start reply with' (closes #7033) 2025-05-30 11:17:47 -07:00
oobabooga af1eef1b08
Merge pull request #7028 from oobabooga/dev
Merge dev branch
2025-05-29 19:07:56 -03:00
oobabooga 28e6bd4fcd Revert "Update transformers requirement in /requirements/full (#7017)"
This reverts commit cc9b7253c1.
2025-05-29 14:49:07 -07:00
oobabooga d1bfb08e8d Improve the style of message editing 2025-05-29 14:27:47 -07:00
oobabooga acbcc12e7b Clean up 2025-05-29 14:11:21 -07:00
oobabooga dce02732a4 Fix timestamp issues when editing/swiping messages 2025-05-29 14:08:48 -07:00
oobabooga 8078c41ec6 Revert "Bump llama.cpp"
This reverts commit a8d02dec8f.
2025-05-29 13:32:19 -07:00
oobabooga a45a652130 CSS fix 2025-05-29 13:28:51 -07:00
oobabooga f59998d268 Don't limit the number of prompt characters printed with --verbose 2025-05-29 13:08:48 -07:00
oobabooga aff41f3482 Update README 2025-05-29 12:53:41 -07:00
oobabooga e7129f9dbe Prevent footer buttons below last assistant message from always appearing 2025-05-29 12:47:07 -07:00
oobabooga 724147ffab Better detect when no model is available 2025-05-29 10:49:29 -07:00
oobabooga faa5c82c64 Fix message version count not updating during regeneration streaming 2025-05-29 09:16:26 -07:00
oobabooga 3f37a2e915 Update README 2025-05-29 08:49:31 -07:00
oobabooga c970c5f166 Make scrollbars darker in dark theme 2025-05-29 08:15:13 -07:00
oobabooga 81794692ab UI: Make the dark theme darker 2025-05-29 08:07:14 -07:00
oobabooga 36bc276005 Update README 2025-05-29 05:39:26 -07:00
oobabooga 0986d075fb Update README 2025-05-29 05:03:59 -07:00
oobabooga 9a94d7b4f6 Update README 2025-05-29 05:02:52 -07:00
oobabooga 2a9699033d Update README 2025-05-29 04:55:59 -07:00
oobabooga f2ee917d4f Update README 2025-05-29 04:55:05 -07:00
oobabooga 685cfe2540 Lint 2025-05-29 04:26:43 -07:00
oobabooga a8d02dec8f Bump llama.cpp 2025-05-29 04:24:21 -07:00
Underscore 63234b9b6f
UI: Fix impersonate (#7025) 2025-05-29 08:22:03 -03:00
oobabooga 75d6cfd14d Download fetched web search results in parallel 2025-05-28 20:36:24 -07:00
oobabooga 7080a02252 Reduce the timeout for downloading web pages 2025-05-28 18:15:21 -07:00
oobabooga 3eb0b77427 Improve the web search query generation 2025-05-28 18:14:51 -07:00
oobabooga 27641ac182 UI: Make message editing work the same for user and assistant messages 2025-05-28 17:23:46 -07:00
oobabooga 6c3590ba9a Make web search attachments clickable 2025-05-28 05:28:15 -07:00
oobabooga 0aedb89921 UI: Small style improvement to attachments 2025-05-28 00:35:20 -07:00
oobabooga 75c6ae8502 UI: Don't edit messages on double click 2025-05-28 00:29:17 -07:00
oobabooga 077bbc6b10
Add web search support (#7023) 2025-05-28 04:27:28 -03:00
oobabooga 1b0e2d8750 UI: Add a token counter to the chat tab (counts input + history) 2025-05-27 22:36:24 -07:00
oobabooga f6ca0ee072 Fix regenerate sometimes not creating a new message version 2025-05-27 21:20:51 -07:00
oobabooga 2db36da979 UI: Make scrollbars more discrete in dark mode 2025-05-27 21:00:11 -07:00
Underscore 5028480eba
UI: Add footer buttons for editing messages (#7019)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2025-05-28 00:55:27 -03:00
Underscore 355b5f6c8b
UI: Add message version navigation (#6947)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2025-05-27 22:54:18 -03:00
dependabot[bot] cc9b7253c1
Update transformers requirement in /requirements/full (#7017) 2025-05-26 23:13:10 -03:00
Underscore 8531100109
Fix textbox text usage in methods (#7009) 2025-05-26 22:40:09 -03:00
djholtby 73bfc936a0
Close response generator when stopping API generation (#7014) 2025-05-26 22:39:03 -03:00
oobabooga bae1aa34aa Fix loading Llama-3_3-Nemotron-Super-49B-v1 and similar models (closes #7012) 2025-05-25 17:19:26 -07:00
oobabooga 7f6579ab20 Minor style change 2025-05-20 21:49:44 -07:00
oobabooga 0d3f854778 Improve the style of thinking blocks 2025-05-20 21:40:42 -07:00
oobabooga 8620d6ffe7 Make it possible to upload multiple text files/pdfs at once 2025-05-20 21:34:07 -07:00
oobabooga cc8a4fdcb1 Minor improvement to attachments prompt format 2025-05-20 21:31:18 -07:00
oobabooga 409a48d6bd
Add attachments support (text files, PDF documents) (#7005) 2025-05-21 00:36:20 -03:00
oobabooga 5d00574a56 Minor UI fixes 2025-05-20 16:20:49 -07:00
oobabooga 51c50b265d Update llama.cpp to b7a17463ec 2025-05-20 11:16:12 -07:00
oobabooga 616ea6966d
Store previous reply versions on regenerate (#7004) 2025-05-20 12:51:28 -03:00
Daniel Dengler c25a381540
Add a "Branch here" footer button to chat messages (#6967) 2025-05-20 11:07:40 -03:00
oobabooga 8e10f9894a
Add a metadata field to the chat history & add date/time to chat messages (#7003) 2025-05-20 10:48:46 -03:00
oobabooga 9ec46b8c44 Remove the HQQ loader (HQQ models can be loaded through Transformers) 2025-05-19 09:23:24 -07:00
oobabooga 0c7237e4b7 Update README 2025-05-18 20:01:29 -07:00
oobabooga bad1da99db Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-05-18 14:09:08 -07:00
oobabooga 0c1bc6d1d0 Bump llama.cpp 2025-05-18 14:08:54 -07:00
Tiago Silva 9cd6ea6c0b
Fix Dockerfile in AMD and Intel (#6995) 2025-05-18 18:07:16 -03:00
oobabooga 83bfd5c64b Fix API issues 2025-05-18 12:45:01 -07:00
oobabooga 126b3a768f Revert "Dynamic Chat Message UI Update Speed (#6952)" (for now)
This reverts commit 8137eb8ef4.
2025-05-18 12:38:36 -07:00
oobabooga 9d7a36356d Remove unnecessary js that was causing scrolling issues 2025-05-18 10:56:16 -07:00
oobabooga 2faaf18f1f Add back the "Common values" to the ctx-size slider 2025-05-18 09:06:20 -07:00
oobabooga f1ec6c8662 Minor label changes 2025-05-18 09:04:51 -07:00
oobabooga bd13a8f255 UI: Light theme improvement 2025-05-17 22:31:55 -07:00
oobabooga 076aa67963 Fix API issues 2025-05-17 22:22:18 -07:00
oobabooga 366de4b561 UI: Fix the chat area height when "Show controls" is unchecked 2025-05-17 17:11:38 -07:00
oobabooga e8595730b4
Merge pull request #6992 from oobabooga/dev
Merge dev branch
2025-05-17 11:58:46 -03:00
oobabooga 61276f6a37 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-05-17 07:22:51 -07:00
oobabooga 4800d1d522 More robust VRAM calculation 2025-05-17 07:20:38 -07:00
mamei16 052c82b664
Fix KeyError: 'gpu_layers' when loading existing model settings (#6991) 2025-05-17 11:19:13 -03:00
oobabooga 0f77ff9670 UI: Use total VRAM (not free) for layers calculation when a model is loaded 2025-05-16 19:19:22 -07:00
oobabooga 17c29fa0a2
Merge pull request #6987 from oobabooga/dev
Merge dev branch
2025-05-16 22:23:59 -03:00
oobabooga 4bf763e1d9 Multiple small CSS fixes 2025-05-16 18:22:43 -07:00
oobabooga c0e295dd1d Remove the 'None' option from the model menu 2025-05-16 17:53:20 -07:00
oobabooga e3bba510d4 UI: Only add a blank space to streaming messages in instruct mode 2025-05-16 17:49:17 -07:00
oobabooga 71fa046c17 Minor changes after 1c549d176b 2025-05-16 17:38:08 -07:00
oobabooga d99fb0a22a Add backward compatibility with saved n_gpu_layers values 2025-05-16 17:29:18 -07:00
oobabooga 1c549d176b Fix GPU layers slider: honor saved settings and show true maximum 2025-05-16 17:26:13 -07:00
oobabooga dc3094549e
Merge pull request #6984 from oobabooga/dev
Merge dev branch
2025-05-16 17:13:26 -03:00
oobabooga e4d3f4449d API: Fix a regression 2025-05-16 13:02:27 -07:00
oobabooga 470c822f44 API: Hide the uvicorn access logs from the terminal 2025-05-16 12:54:39 -07:00
oobabooga adb975a380 Prevent fractional gpu-layers in the UI 2025-05-16 12:52:43 -07:00
oobabooga fc483650b5 Set the maximum gpu_layers value automatically when the model is loaded with --model 2025-05-16 11:58:17 -07:00
oobabooga 38c50087fe Prevent a crash on systems without an NVIDIA GPU 2025-05-16 11:55:30 -07:00
oobabooga 253e85a519 Only compute VRAM/GPU layers for llama.cpp models 2025-05-16 10:02:30 -07:00
oobabooga 9ec9b1bf83 Auto-adjust GPU layers after model unload to utilize freed VRAM 2025-05-16 09:56:23 -07:00
oobabooga ee7b3028ac Always cache GGUF metadata calls 2025-05-16 09:12:36 -07:00
oobabooga 4925c307cf Auto-adjust GPU layers on context size and cache type changes + many fixes 2025-05-16 09:07:38 -07:00
oobabooga 93e1850a2c Only show the VRAM info for llama.cpp 2025-05-15 21:42:15 -07:00
oobabooga cbf4daf1c8 Hide the LoRA menu in portable mode 2025-05-15 21:21:54 -07:00
oobabooga fd61297933 Lint 2025-05-15 21:19:19 -07:00
oobabooga 8cb73b78e1 Update ExLlamaV3 2025-05-15 20:10:34 -07:00
oobabooga 041248cc9f Update llama.cpp 2025-05-15 20:10:02 -07:00
oobabooga 5534d01da0
Estimate the VRAM for GGUF models + autoset gpu-layers (#6980) 2025-05-16 00:07:37 -03:00
oobabooga c4a715fd1e UI: Move the LoRA menu under "Other options" 2025-05-13 20:14:09 -07:00
oobabooga 035cd3e2a9 UI: Hide the extension install menu in portable builds 2025-05-13 20:09:22 -07:00
oobabooga 2826c60044 Use logger for "Output generated in ..." messages 2025-05-13 14:45:46 -07:00
oobabooga 3fa1a899ae UI: Fix gpu-layers being ignored (closes #6973) 2025-05-13 12:07:59 -07:00
oobabooga c375b69413 API: Fix llama.cpp generating after disconnect, improve disconnect detection, fix deadlock on simultaneous requests 2025-05-13 11:23:33 -07:00
oobabooga 62c774bf24 Revert "New attempt"
This reverts commit e7ac06c169.
2025-05-13 06:42:25 -07:00
oobabooga e7ac06c169 New attempt 2025-05-10 19:20:04 -07:00
oobabooga 0c5fa3728e Revert "Fix API failing to cancel streams (attempt), closes #6966"
This reverts commit 006a866079.
2025-05-10 19:12:40 -07:00
oobabooga 006a866079 Fix API failing to cancel streams (attempt), closes #6966 2025-05-10 17:55:48 -07:00
oobabooga 47d4758509 Fix #6970 2025-05-10 17:46:00 -07:00
oobabooga 4920981b14 UI: Remove the typing cursor 2025-05-09 20:35:38 -07:00
oobabooga 8984e95c67 UI: More friendly message when no model is loaded 2025-05-09 07:21:05 -07:00
oobabooga 2bde625d57 Update README 2025-05-09 00:19:25 -07:00
oobabooga 512bc2d0e0 UI: Update some labels 2025-05-08 23:43:55 -07:00
oobabooga f8ef6e09af UI: Make ctx-size a slider 2025-05-08 18:19:04 -07:00
oobabooga bf7e4a4597 Docs: Add a tool/function calling example (from https://github.com/oobabooga/text-generation-webui/pull/6827#issuecomment-2854716960) 2025-05-08 16:12:07 -07:00
oobabooga 9ea2a69210 llama.cpp: Add --no-webui to the llama-server command 2025-05-08 10:41:25 -07:00
oobabooga 3bc2ec2b11 Fix #6965 2025-05-08 10:34:09 -07:00
oobabooga 1c7209a725 Save the chat history periodically during streaming 2025-05-08 09:46:43 -07:00
oobabooga a1b3307b66 Bump llama.cpp 2025-05-08 08:58:43 -07:00
Jonas fa960496d5
Tools support for OpenAI compatible API (#6827) 2025-05-08 12:30:27 -03:00
Scott Z ed6e16191d
Docker fix for NVIDIA (#6964) 2025-05-08 12:21:52 -03:00
oobabooga 13a434f351 Bump exllamav3 2025-05-08 08:06:07 -07:00
oobabooga a2ab42d390 UI: Remove the exllamav2 info message 2025-05-08 08:00:38 -07:00
oobabooga 348d4860c2 UI: Create a "Main options" section in the Model tab 2025-05-08 07:58:59 -07:00
oobabooga d2bae7694c UI: Change the ctx-size description 2025-05-08 07:26:23 -07:00
oobabooga b28fa86db6 Default --gpu-layers to 256 2025-05-06 17:51:55 -07:00
oobabooga 760b4dd115 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-05-06 14:02:57 -07:00
oobabooga e4fb2475d2 UI: Multiple small style improvements (light/dark themes) 2025-05-06 14:02:15 -07:00
Downtown-Case 5ef564a22e
Fix model config loading in shared.py for Python 3.13 (#6961) 2025-05-06 17:03:33 -03:00
oobabooga c4f36db0d8 llama.cpp: remove tfs (it doesn't get used) 2025-05-06 08:41:13 -07:00
oobabooga 05115e42ee Set top_n_sigma before temperature by default 2025-05-06 08:27:21 -07:00
oobabooga 1927afe894 Fix top_n_sigma not showing for llama.cpp 2025-05-06 08:18:49 -07:00
oobabooga 605cc9ab14 Update exllamav3 2025-05-06 06:43:35 -07:00
oobabooga 89590adc14 Update llama.cpp 2025-05-06 06:41:17 -07:00
oobabooga d1c0154d66 llama.cpp: Add top_n_sigma, fix typical_p in sampler priority 2025-05-06 06:38:39 -07:00
oobabooga cbef35054c UI: CSS fix 2025-05-05 17:46:09 -07:00
Evgenii Novikov 4e8f628d3c
docker: App uid typo in other docker composes (#6958) 2025-05-05 20:05:15 -03:00
oobabooga 530223bf0b UI: Fix the hover menu colors 2025-05-05 16:03:43 -07:00
oobabooga 76f947e3cf UI: Minor style change 2025-05-05 15:58:29 -07:00
Alireza Ghasemi 99bd66445f
SuperboogaV2: minor update to avoid json serialization errors #6945 2025-05-05 19:04:06 -03:00
Evgenii Novikov 987505ead3
docker: Fix app uid typo in cpu docker compose (#6957) 2025-05-05 19:03:33 -03:00
oobabooga 941e0663da Update README 2025-05-05 14:18:16 -07:00
oobabooga f82667f0b4 Remove more multimodal extension references 2025-05-05 14:17:00 -07:00
oobabooga 85bf2e15b9 API: Remove obsolete multimodal extension handling
Multimodal support will be added back once it's implemented in llama-server.
2025-05-05 14:14:48 -07:00
mamei16 8137eb8ef4
Dynamic Chat Message UI Update Speed (#6952) 2025-05-05 18:05:23 -03:00
oobabooga 53d8e46502 Ensure environment isolation in portable installs 2025-05-05 12:28:17 -07:00
oobabooga bf5290bc0f Fix the hover menu in light theme 2025-05-05 08:04:12 -07:00
oobabooga 967b70327e Light theme improvement 2025-05-05 07:59:02 -07:00
oobabooga 6001d279c6 Light theme improvement 2025-05-05 07:42:13 -07:00
oobabooga 475e012ee8 UI: Improve the light theme colors 2025-05-05 06:16:29 -07:00
oobabooga b817bb33fd Minor fix after df7bb0db1f 2025-05-05 05:00:20 -07:00
oobabooga f3da45f65d ExLlamaV3_HF: Change max_chunk_size to 256 2025-05-04 20:37:15 -07:00
oobabooga df7bb0db1f Rename --n-gpu-layers to --gpu-layers 2025-05-04 20:03:55 -07:00
oobabooga d0211afb3c Save the chat history right after sending a message 2025-05-04 18:52:01 -07:00
oobabooga 2da197bba4 Refinement after previous commit 2025-05-04 18:29:05 -07:00
oobabooga 690d693913 UI: Add padding to only show the last message/reply after sending a message
To avoid scrolling
2025-05-04 18:13:29 -07:00
oobabooga d9da16edba UI: Remove the chat input textarea border 2025-05-04 16:53:52 -07:00
oobabooga 84ab1f95be UI: Increase the chat area a bit 2025-05-04 15:21:52 -07:00
oobabooga d186621926 UI: Fixes after previous commit 2025-05-04 15:19:46 -07:00
oobabooga 7853fb1c8d
Optimize the Chat tab (#6948) 2025-05-04 18:58:37 -03:00
oobabooga b7a5c7db8d llama.cpp: Handle short arguments in --extra-flags 2025-05-04 07:14:42 -07:00
oobabooga 5f5569e9ac Update README 2025-05-04 06:20:36 -07:00
oobabooga 4c2e3b168b llama.cpp: Add a retry mechanism when getting the logits (sometimes it fails) 2025-05-03 06:51:20 -07:00
oobabooga ea60f14674 UI: Show the list of files if the user tries to download a GGUF repository 2025-05-03 06:06:50 -07:00
oobabooga b71ef50e9d UI: Add a min-height to prevent constant scrolling during chat streaming 2025-05-02 23:45:58 -07:00
oobabooga b21bd8bb1e UI: Invert user/assistant message colors in instruct mode
The goal is to make assistant messages more readable.
2025-05-02 22:43:33 -07:00
oobabooga d08acb4af9 UI: Rename enable_thinking -> Enable thinking 2025-05-02 20:50:52 -07:00
oobabooga 3526b7923c Remove extensions with requirements from portable builds 2025-05-02 17:40:53 -07:00
oobabooga 4cea720da8 UI: Remove the "Autoload the model" feature 2025-05-02 16:38:28 -07:00
oobabooga 905afced1c Add a --portable flag to hide things in portable mode 2025-05-02 16:34:29 -07:00
oobabooga 3f26b0408b Fix after 9e3867dc83 2025-05-02 16:17:22 -07:00
oobabooga 9e3867dc83 llama.cpp: Fix manual random seeds 2025-05-02 09:36:15 -07:00
oobabooga d5c407cf35 Use Vulkan instead of ROCm for llama.cpp on AMD 2025-05-01 20:05:36 -07:00
oobabooga f8aaf3c23a Use ROCm 6.2.4 on AMD 2025-05-01 19:50:46 -07:00
oobabooga c12a53c998 Use turboderp's exllamav2 wheels 2025-05-01 19:46:56 -07:00
oobabooga ace8afb825
Merge dev branch 2025-05-01 12:25:04 -03:00
oobabooga 89090d9a61 Update README 2025-05-01 08:22:54 -07:00
oobabooga a41da1ec95
Merge pull request #6939 from oobabooga/dev
Merge dev branch
2025-05-01 00:15:11 -03:00
oobabooga b950a0c6db Lint 2025-04-30 20:02:10 -07:00
oobabooga 307d13b540 UI: Minor label change 2025-04-30 18:58:14 -07:00
oobabooga 55283bb8f1 Fix CFG with ExLlamaV2_HF (closes #6937) 2025-04-30 18:43:45 -07:00
oobabooga ec2e641749 Update settings-template.yaml 2025-04-30 15:25:26 -07:00
oobabooga a6c3ec2299 llama.cpp: Explicitly send cache_prompt = True 2025-04-30 15:24:07 -07:00
oobabooga 195a45c6e1 UI: Make thinking blocks closed by default 2025-04-30 15:12:46 -07:00
oobabooga cd5c32dc19 UI: Fix max_updates_second not working 2025-04-30 14:54:05 -07:00
oobabooga b46ca01340 UI: Set max_updates_second to 12 by default
When the tokens/second at at ~50 and the model is a thinking model,
the markdown rendering for the streaming message becomes a CPU
bottleneck.
2025-04-30 14:53:15 -07:00
oobabooga a4bf339724 Bump llama.cpp 2025-04-30 11:13:14 -07:00
oobabooga e9569c3984 Fixes after c5fe92d152 2025-04-30 06:57:23 -07:00
oobabooga 771d3d8ed6 Fix getting the llama.cpp logprobs for Qwen3-30B-A3B 2025-04-30 06:48:32 -07:00
oobabooga 7f49e3c3ce Bump ExLlamaV3 2025-04-30 05:25:09 -07:00
oobabooga c5fe92d152 Bump llama.cpp 2025-04-30 05:24:58 -07:00
oobabooga 1dd4aedbe1 Fix the streaming_llm UI checkbox not being interactive 2025-04-29 05:28:46 -07:00
oobabooga c5fb51e5d1 Update README 2025-04-28 22:40:26 -07:00
oobabooga d10bded7f8 UI: Add an enable_thinking option to enable/disable Qwen3 thinking 2025-04-28 22:37:01 -07:00
oobabooga 1ee0acc852 llama.cpp: Make --verbose print the llama-server command 2025-04-28 15:56:25 -07:00
oobabooga 15a29e99f8 Lint 2025-04-27 21:41:34 -07:00
oobabooga be13f5199b UI: Add an info message about how to use Speculative Decoding 2025-04-27 21:40:38 -07:00
oobabooga c6c2855c80 llama.cpp: Remove the timeout while loading models (closes #6907) 2025-04-27 21:22:21 -07:00
oobabooga bbcaec75b4 API: Find a new port if the default one is taken (closes #6918) 2025-04-27 21:13:16 -07:00
oobabooga ee0592473c Fix ExLlamaV3_HF leaking memory (attempt) 2025-04-27 21:04:02 -07:00
oobabooga 6e6f9971a2
Merge pull request #6919 from oobabooga/dev
Merge dev branch
2025-04-27 11:35:19 -03:00
oobabooga 965ca7948f Update README 2025-04-27 07:33:08 -07:00
oobabooga 1180bb0d80
Merge pull request #6913 from oobabooga/dev
Merge dev branch
2025-04-27 00:12:16 -03:00
oobabooga f5b59d2b0b Fix the vulkan workflow 2025-04-26 20:11:24 -07:00
oobabooga 9bb9ce079e
Merge pull request #6912 from oobabooga/dev
Merge dev branch
2025-04-27 00:03:16 -03:00
oobabooga 765fea5e36 UI: minor style change 2025-04-26 19:33:46 -07:00
oobabooga 70952553c7 Lint 2025-04-26 19:29:08 -07:00
oobabooga 363b632a0d Lint 2025-04-26 19:22:36 -07:00
oobabooga fa861de05b Fix portable builds with Python 3.12 2025-04-26 18:52:44 -07:00
oobabooga 7b80acd524 Fix parsing --extra-flags 2025-04-26 18:40:03 -07:00
oobabooga 943451284f Fix the Notebook tab not loading its default prompt 2025-04-26 18:25:06 -07:00
oobabooga 511eb6aa94 Fix saving settings to settings.yaml 2025-04-26 18:20:00 -07:00
oobabooga 8b83e6f843 Prevent Gradio from saying 'Thank you for being a Gradio user!' 2025-04-26 18:14:57 -07:00
oobabooga 4a32e1f80c UI: show draft_max for ExLlamaV2 2025-04-26 18:01:44 -07:00
oobabooga 0fe3b033d0 Fix parsing of --n_ctx and --max_seq_len (2nd attempt) 2025-04-26 17:52:21 -07:00
oobabooga c4afc0421d Fix parsing of --n_ctx and --max_seq_len 2025-04-26 17:43:53 -07:00
oobabooga 234aba1c50 llama.cpp: Simplify the prompt processing progress indicator
The progress bar was unreliable
2025-04-26 17:33:47 -07:00
oobabooga 4ff91b6588 Better default settings for Speculative Decoding 2025-04-26 17:24:40 -07:00
oobabooga bf2aa19b21 Bump llama.cpp 2025-04-26 16:39:22 -07:00
oobabooga 029aab6404 Revert "Add -noavx2 portable builds"
This reverts commit 0dd71e78c9.
2025-04-26 16:38:13 -07:00
oobabooga 35717a088c API: Add an /v1/internal/health endpoint 2025-04-26 15:42:27 -07:00
oobabooga bc55feaf3e Improve host header validation in local mode 2025-04-26 15:42:17 -07:00
oobabooga a317450dfa Update README 2025-04-26 14:59:29 -07:00
oobabooga d1e7d9c5d5 Update CMD_FLAGS.txt 2025-04-26 09:00:56 -07:00
oobabooga 3a207e7a57 Improve the --help formatting a bit 2025-04-26 07:31:04 -07:00
oobabooga 6acb0e1bee Change a UI description 2025-04-26 05:13:08 -07:00
oobabooga cbd4d967cc Update a --help message 2025-04-26 05:09:52 -07:00
oobabooga 19c8dced67 Move settings-template.yaml into user_data 2025-04-26 05:03:23 -07:00
oobabooga b976112539 Remove the WSL installation scripts
They were useful in 2023 but now everything runs natively on Windows.
2025-04-26 05:02:17 -07:00
oobabooga 763a7011c0 Remove an ancient/obsolete migration check 2025-04-26 04:59:05 -07:00
oobabooga d9de14d1f7
Restructure the repository (#6904) 2025-04-26 08:56:54 -03:00
oobabooga d4017fbb6d
ExLlamaV3: Add kv cache quantization (#6903) 2025-04-25 21:32:00 -03:00
oobabooga d4b1e31c49 Use --ctx-size to specify the context size for all loaders
Old flags are still recognized as alternatives.
2025-04-25 16:59:03 -07:00
oobabooga faababc4ea llama.cpp: Add a prompt processing progress bar 2025-04-25 16:42:30 -07:00
oobabooga 877cf44c08 llama.cpp: Add StreamingLLM (--streaming-llm) 2025-04-25 16:21:41 -07:00
oobabooga d35818f4e1
UI: Add a collapsible thinking block to messages with <think> steps (#6902) 2025-04-25 18:02:02 -03:00
oobabooga 0dd71e78c9 Add -noavx2 portable builds 2025-04-25 09:07:14 -07:00
oobabooga 98f4c694b9 llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server 2025-04-25 07:32:51 -07:00
oobabooga b6fffbd216 UI: minor style change 2025-04-25 05:37:44 -07:00
oobabooga 2c7ff86015 Bump exllamav3 to de83084184 2025-04-25 05:28:22 -07:00
oobabooga 5993ebeb1b Bump exllamav2 to 0.2.9 2025-04-25 05:27:59 -07:00
oobabooga 23399aff3c UI: minor style change 2025-04-24 20:39:00 -07:00
oobabooga 5861013e68 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-04-24 20:36:20 -07:00
oobabooga a90df27ff5 UI: Add a greeting when the chat history is empty 2025-04-24 20:33:40 -07:00
oobabooga ae1fe87365
ExLlamaV2: Add speculative decoding (#6899) 2025-04-25 00:11:04 -03:00
Matthew Jenkins 8f2493cc60
Prevent llamacpp defaults from locking up consumer hardware (#6870) 2025-04-24 23:38:57 -03:00
oobabooga 370fe7b7cf Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-04-24 09:33:17 -07:00
oobabooga 8ebe868916 Fix typos in b313adf653 2025-04-24 09:32:17 -07:00
oobabooga 93fd4ad25d llama.cpp: Document the --device-draft syntax 2025-04-24 09:20:11 -07:00
oobabooga f1b64df8dd EXL2: add another torch.cuda.synchronize() call to prevent errors 2025-04-24 09:03:49 -07:00
Ziya 60ac495d59
extensions/superboogav2: existing embedding check bug fix (#6898) 2025-04-24 12:42:05 -03:00
oobabooga b313adf653 Bump llama.cpp, make the wheels work with any Python >= 3.7 2025-04-24 08:26:12 -07:00
oobabooga c71a2af5ab Handle CMD_FLAGS.txt in the main code (closes #6896) 2025-04-24 08:21:06 -07:00
oobabooga bfbde73409 Make 'instruct' the default chat mode 2025-04-24 07:08:49 -07:00
oobabooga e99c20bcb0
llama.cpp: Add speculative decoding (#6891) 2025-04-23 20:10:16 -03:00
oobabooga 9424ba17c8 UI: show only part 00001 of multipart GGUF models in the model menu 2025-04-22 19:56:42 -07:00
oobabooga 1aa76b3beb
Merge pull request #6885 from oobabooga/dev
Merge dev branch
2025-04-22 22:38:24 -03:00
oobabooga bce1b68ca9 Minor fix after previous commit 2025-04-22 18:37:36 -07:00
oobabooga 812d878812 Make the dependabot less spammy 2025-04-22 18:35:22 -07:00
oobabooga 1df2b0d3ae
Merge pull request #6884 from oobabooga/dev
Merge dev branch
2025-04-22 22:02:30 -03:00
oobabooga 8228822a6c Revert "Temporary change"
This reverts commit 765de6f678.
2025-04-22 18:01:47 -07:00
oobabooga 62455b415c
Merge pull request #6883 from oobabooga/dev
Merge dev branch
2025-04-22 21:54:34 -03:00
oobabooga 765de6f678 Temporary change 2025-04-22 17:53:56 -07:00
oobabooga 89ec4c9ba6 Add vulkan workflow 2025-04-22 17:51:08 -07:00
oobabooga 06619e5f03 Add vulkan requirements.txt files 2025-04-22 17:46:54 -07:00
oobabooga 022664f2bd
Merge pull request #6881 from oobabooga/dev
Merge dev branch
2025-04-22 12:15:34 -03:00
oobabooga 4335a24ff8 Fix the workflow 2025-04-22 08:14:13 -07:00
oobabooga a778270536
Merge pull request #6869 from oobabooga/dev
Merge dev branch
2025-04-22 12:09:20 -03:00
oobabooga 25cf3600aa Lint 2025-04-22 08:04:02 -07:00
oobabooga 39cbb5fee0 Lint 2025-04-22 08:03:25 -07:00
oobabooga da1919baae Update the README 2025-04-22 08:03:22 -07:00
oobabooga a3031795a3 Update the zip filename 2025-04-22 08:03:16 -07:00
oobabooga 008c6dd682 Lint 2025-04-22 08:02:37 -07:00
oobabooga ee09e44c85
Portable version (#6868) 2025-04-22 09:25:57 -03:00
oobabooga 78aeabca89 Fix the transformers loader 2025-04-21 18:33:14 -07:00
oobabooga 8320190184 Fix the exllamav2_HF and exllamav3_HF loaders 2025-04-21 18:32:23 -07:00
oobabooga 15989c2ed8 Make llama.cpp the default loader 2025-04-21 16:36:35 -07:00
oobabooga 86c3ed3218 Small change to the unload_model() function 2025-04-20 20:00:56 -07:00
oobabooga c178ea02fe Revert "Move the requirements*.txt to a requirements folder"
This reverts commit 6117ef7d64.
2025-04-20 19:27:38 -07:00
oobabooga 6117ef7d64 Move the requirements*.txt to a requirements folder 2025-04-20 19:12:04 -07:00
oobabooga fe8e80e04a Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-04-20 19:09:27 -07:00
oobabooga ff1c00bdd9 llama.cpp: set the random seed manually 2025-04-20 19:08:44 -07:00
Matthew Jenkins d3e7c655e5
Add support for llama-cpp builds from https://github.com/ggml-org/llama.cpp (#6862) 2025-04-20 23:06:24 -03:00
oobabooga 99588be576 Organize one_click.py 2025-04-20 18:57:26 -07:00
oobabooga e243424ba1 Fix an import 2025-04-20 17:51:28 -07:00
oobabooga 8cfd7f976b Revert "Remove the old --model-menu flag"
This reverts commit 109de34e3b.
2025-04-20 13:35:42 -07:00
oobabooga d5e1bccef9 Remove the SpeechRecognition requirement 2025-04-20 11:47:28 -07:00
oobabooga b3bf7a885d Fix ExLlamaV2_HF and ExLlamaV3_HF after ae02ffc605 2025-04-20 11:32:48 -07:00
oobabooga 9c59acf820 Remove the numba requirement (it's no longer used) 2025-04-20 10:02:40 -07:00
oobabooga ae02ffc605
Refactor the transformers loader (#6859) 2025-04-20 13:33:47 -03:00
oobabooga c19b995b8e
Merge pull request #6857 from oobabooga/dev
Merge dev branch
2025-04-19 21:45:55 -03:00
oobabooga 6ba0164c70 Lint 2025-04-19 17:45:21 -07:00
oobabooga 5ab069786b llama.cpp: add back the two encode calls (they are harmless now) 2025-04-19 17:38:36 -07:00
oobabooga b9da5c7e3a Use 127.0.0.1 instead of localhost for faster llama.cpp on Windows 2025-04-19 17:36:04 -07:00
oobabooga 9c9df2063f llama.cpp: fix unicode decoding (closes #6856) 2025-04-19 16:38:15 -07:00
oobabooga ba976d1390 llama.cpp: avoid two 'encode' calls 2025-04-19 16:35:01 -07:00
oobabooga ed42154c78 Revert "llama.cpp: close the connection immediately on 'Stop'"
This reverts commit 5fdebc554b.
2025-04-19 05:32:36 -07:00
oobabooga 5fdebc554b llama.cpp: close the connection immediately on 'Stop' 2025-04-19 04:59:24 -07:00
oobabooga b1495d52e5
Merge pull request #6855 from oobabooga/dev
Merge dev branch
2025-04-19 01:53:11 -03:00
oobabooga 6589ebeca8 Revert "llama.cpp: new optimization attempt"
This reverts commit e2e73ed22f.
2025-04-18 21:16:21 -07:00
oobabooga e2e73ed22f llama.cpp: new optimization attempt 2025-04-18 21:05:08 -07:00
oobabooga e2e90af6cd llama.cpp: don't include --rope-freq-base in the launch command if null 2025-04-18 20:51:18 -07:00
oobabooga 44a6d8a761
Merge pull request #6854 from oobabooga/dev
Merge dev branch
2025-04-18 23:41:56 -03:00
oobabooga 9f07a1f5d7 llama.cpp: new attempt at optimizing the llama-server connection 2025-04-18 19:30:53 -07:00
oobabooga f727b4a2cc llama.cpp: close the connection properly when generation is cancelled 2025-04-18 19:01:39 -07:00
oobabooga b3342b8dd8 llama.cpp: optimize the llama-server connection 2025-04-18 18:46:36 -07:00
oobabooga 4fa52a1302
Merge pull request #6852 from oobabooga/dev
Merge dev branch
2025-04-18 22:15:40 -03:00
oobabooga 2002590536 Revert "Attempt at making the llama-server streaming more efficient."
This reverts commit 5ad080ff25.
2025-04-18 18:13:54 -07:00
oobabooga 71ae05e0a4 llama.cpp: Fix the sampler priority handling 2025-04-18 18:06:36 -07:00
oobabooga 5ad080ff25 Attempt at making the llama-server streaming more efficient. 2025-04-18 18:04:49 -07:00
oobabooga 4fabd729c9 Fix the API without streaming or without 'sampler_priority' (closes #6851) 2025-04-18 17:25:22 -07:00
oobabooga 5135523429 Fix the new llama.cpp loader failing to unload models 2025-04-18 17:10:26 -07:00
oobabooga 4eecb6611f
Merge pull request #6850 from oobabooga/dev
Merge dev branch
2025-04-18 15:33:32 -03:00
oobabooga 8d481ef9d5 Update README 2025-04-18 11:31:22 -07:00
oobabooga caa6afc88b Only show 'GENERATE_PARAMS=...' in the logits endpoint if use_logits is True 2025-04-18 09:57:57 -07:00
oobabooga c5e54c0b37
Merge pull request #6848 from oobabooga/dev
Merge dev branch
2025-04-18 13:36:06 -03:00
oobabooga e52f62d3ff Update README 2025-04-18 09:29:57 -07:00
oobabooga 85c4486d4a Update the colab notebook 2025-04-18 08:53:44 -07:00
oobabooga d00d713ace Rename get_max_context_length to get_vocabulary_size in the new llama.cpp loader 2025-04-18 08:14:15 -07:00
oobabooga c1cc65e82e Lint 2025-04-18 08:06:51 -07:00
oobabooga d68f0fbdf7 Remove obsolete references to llamacpp_HF 2025-04-18 07:46:04 -07:00
oobabooga a0abf93425 Connect --rope-freq-base to the new llama.cpp loader 2025-04-18 06:53:51 -07:00
oobabooga ef9910c767 Fix a bug after c6901aba9f 2025-04-18 06:51:28 -07:00
oobabooga 1c4a2c9a71 Make exllamav3 safer as well 2025-04-18 06:17:58 -07:00
oobabooga 03544d4fb6 Bump llama.cpp and exllamav3 to the latest commits 2025-04-18 06:14:13 -07:00
oobabooga c6901aba9f Remove deprecation warning code 2025-04-18 06:05:47 -07:00
oobabooga 170ad3d3ec Update the README 2025-04-18 06:03:35 -07:00
oobabooga 8144e1031e Remove deprecated command-line flags 2025-04-18 06:02:28 -07:00
oobabooga ae54d8faaa
New llama.cpp loader (#6846) 2025-04-18 09:59:37 -03:00
oobabooga 5c2f8d828e Fix exllamav2 generating eos randomly after previous fix 2025-04-18 05:42:38 -07:00
oobabooga 2fc58ad935 Consider files with .pt extension in the new model menu function 2025-04-17 23:10:43 -07:00
Googolplexed d78abe480b
Allow for model subfolder organization for GGUF files (#6686)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2025-04-18 02:53:59 -03:00
oobabooga ce9e2d94b1 Revert "Attempt at solving the ExLlamaV2 issue"
This reverts commit c9b3c9dfbf.
2025-04-17 22:03:21 -07:00
oobabooga 5dfab7d363 New attempt at solving the exl2 issue 2025-04-17 22:03:11 -07:00
oobabooga c9b3c9dfbf Attempt at solving the ExLlamaV2 issue 2025-04-17 21:45:15 -07:00
oobabooga 2c2d453c8c Revert "Use ExLlamaV2 (instead of the HF one) for EXL2 models for now"
This reverts commit 0ef1b8f8b4.
2025-04-17 21:31:32 -07:00
oobabooga 0ef1b8f8b4 Use ExLlamaV2 (instead of the HF one) for EXL2 models for now
It doesn't seem to have the "OverflowError" bug
2025-04-17 05:47:40 -07:00
oobabooga 38dc09dca5 Bump exllamav3 to the latest commit 2025-04-15 09:50:36 -07:00
oobabooga 038a012581 Installer: Remove .installer_state.json on reinstalling 2025-04-11 21:12:32 -07:00
oobabooga 682c78ea42 Add back detection of GPTQ models (closes #6841) 2025-04-11 21:00:42 -07:00
oobabooga 454366f93e Change the ExLlamaV3 wheel version to 0.0.1a1 2025-04-10 18:33:29 -07:00
oobabooga d7b336d37e Update the README 2025-04-09 20:12:14 -07:00
oobabooga 4ed0da74a8 Remove the obsolete 'multimodal' extension 2025-04-09 20:09:48 -07:00
oobabooga 598568b1ed Revert "UI: remove the streaming cursor"
This reverts commit 6ea0206207.
2025-04-09 16:03:14 -07:00
oobabooga 297a406e05 UI: smoother chat streaming
This removes the throttling associated to gr.Textbox that made words appears in chunks rather than one at a time
2025-04-09 16:02:37 -07:00
oobabooga 6ea0206207 UI: remove the streaming cursor 2025-04-09 14:59:34 -07:00
oobabooga 14e6baeb48
Merge pull request #6838 from oobabooga/dev
Merge dev branch
2025-04-09 14:48:37 -03:00
oobabooga 9025848df5 Small change to installer 2025-04-09 10:25:47 -07:00
oobabooga d337ea31fa Revert "Reapply "Update transformers requirement from ==4.50.* to ==4.51.* (#6834)""
This reverts commit 8229736ec4.
2025-04-09 10:16:47 -07:00
oobabooga 8229736ec4 Reapply "Update transformers requirement from ==4.50.* to ==4.51.* (#6834)"
This reverts commit 0b3503c91f.
2025-04-09 08:38:06 -07:00
oobabooga 89f40cdcf7 Update libstdcxx-ng for GLIBCXX_3.4.30 support on Linux 2025-04-09 08:28:44 -07:00
oobabooga ad1ada6574 Change one message in the installer 2025-04-09 05:17:10 -07:00
oobabooga d8aad6da94 Fix an update bug 2025-04-08 20:20:24 -07:00
oobabooga 8b8d39ec4e
Add ExLlamaV3 support (#6832) 2025-04-09 00:07:08 -03:00
oobabooga 0b3503c91f Revert "Update transformers requirement from ==4.50.* to ==4.51.* (#6834)"
This reverts commit f1f32386b4.
2025-04-08 12:26:03 -07:00
oobabooga 649ee729c1 Remove Python 3.10 support 2025-04-08 09:22:06 -07:00
oobabooga bf48ec8c44 Remove an unnecessary UI message 2025-04-07 17:43:41 -07:00
oobabooga a5855c345c
Set context lengths to at most 8192 by default (to prevent out of memory errors) (#6835) 2025-04-07 21:42:33 -03:00
dependabot[bot] f1f32386b4
Update transformers requirement from ==4.50.* to ==4.51.* (#6834) 2025-04-07 19:29:39 -03:00
oobabooga 204db28362 Update the dockerfiles 2025-04-06 18:48:31 -07:00
oobabooga eef90a4964 Update some intel arc installation commands 2025-04-06 17:44:07 -07:00
oobabooga a8a64b6c1c Update the README 2025-04-06 17:40:18 -07:00
oobabooga c010cea7be Remove CUDA 11.8 support 2025-04-06 17:17:25 -07:00
Shixian Sheng cbffcf67ef
Fix links in the ngrok extension README (#6826) 2025-04-02 14:28:29 -03:00
dependabot[bot] 77a73cc561
Update peft requirement from ==0.12.* to ==0.15.* (#6820) 2025-03-31 21:01:27 -03:00
oobabooga 109de34e3b Remove the old --model-menu flag 2025-03-31 09:24:03 -07:00
oobabooga bb1905ebc5 Fix the colab notebook 2025-03-29 19:17:36 -07:00
oobabooga 1981327285 Fix the colab notebook 2025-03-29 19:17:14 -07:00
oobabooga 79a26d7a5c Lint 2025-03-29 18:49:48 -07:00
oobabooga 1bd208c219
Add a new chat style: Dark (#6817) 2025-03-29 22:47:10 -03:00
oobabooga 9b80d1d6c2 Remove the stalebot 2025-03-29 13:44:37 -07:00
oobabooga 525b1e0207 Remove the stalebot 2025-03-29 13:43:16 -07:00
dependabot[bot] 2bfaf44df0
Update accelerate requirement from ==1.4.* to ==1.5.* (#6802) 2025-03-26 10:03:21 -03:00
oobabooga 01e42a00ff Bump transformers to 4.50 2025-03-26 06:01:57 -07:00
oobabooga 80cdbe4e09
Merge pull request #6797 from oobabooga/dev
Merge dev branch
2025-03-15 00:11:25 -03:00
oobabooga 758c3f15a5 Lint 2025-03-14 20:04:43 -07:00
SeanScripts 60d67994d9
Perplexity colors extension updates (#6764) 2025-03-14 16:45:53 -03:00
oobabooga 5bcd2d7ad0
Add the top N-sigma sampler (#6796) 2025-03-14 16:45:11 -03:00
oobabooga 677d74a6a0 Revert "UI: improved scrollbar styles", add just a small change instead 2025-03-14 12:10:48 -07:00
oobabooga 6ab04698f6 UI: improve the light mode left sidebar color 2025-03-14 12:03:49 -07:00
oobabooga 26317a4c7e Fix jinja2 error while loading c4ai-command-a-03-2025 2025-03-14 10:59:05 -07:00
oobabooga f04a37adc2 UI: improved scrollbar styles 2025-03-14 05:20:15 -07:00
oobabooga 0261338910 Bump llama-cpp-python to 0.3.8 2025-03-12 17:55:25 -07:00
oobabooga 39fded487a Bump ExllamaV2 to 0.2.8 2025-03-12 17:54:30 -07:00
dependabot[bot] a12e05d9c0
Bump jinja2 from 3.1.5 to 3.1.6 (#6786) 2025-03-12 16:11:03 -03:00
Kelvie Wong 769eee1ff3 Fix OpenAI API with new param (show_after), closes #6747 (#6749)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2025-02-18 07:02:19 -08:00
Kelvie Wong 16fa9215c4
Fix OpenAI API with new param (show_after), closes #6747 (#6749)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2025-02-18 12:01:30 -03:00
SeanScripts b131f86584
Perplexity colors extension v2 (#6756) 2025-02-18 11:56:28 -03:00
Alireza Ghasemi 01f20d2d9f
Improve SuperboogaV2 with Date/Time Embeddings, GPU Support, and Multiple File Formats (#6748) 2025-02-17 22:38:15 -03:00
dependabot[bot] 12f6f7ba9f
Update accelerate requirement from ==1.3.* to ==1.4.* (#6753) 2025-02-17 22:35:38 -03:00
oobabooga dba17c40fc Make transformers 4.49 functional 2025-02-17 17:31:11 -08:00
oobabooga 16f4f1a1c3 Bump transformers to 4.49 2025-02-17 17:20:10 -08:00
oobabooga 7c883ef2f0
Merge pull request #6746 from oobabooga/dev
Merge dev branch
2025-02-14 23:25:31 -03:00
oobabooga cf9676c4d5 Update README 2025-02-14 18:05:36 -08:00
Manuel Schmid b54bf359bf
sd_api_pictures model reload fix (#6720) 2025-02-03 00:11:49 -03:00
oobabooga edbe0af647 Minor fixes after 0360f54ae8 2025-02-02 17:04:56 -08:00
oobabooga 6724d2bfa4 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2025-02-02 16:59:03 -08:00
oobabooga 44e569c3a2 Remove obsolete convert-to-safetensors.py from the repository 2025-02-02 16:15:33 -08:00
SamAcctX f28f39792d
update deprecated deepspeed import for transformers 4.46+ (#6725) 2025-02-02 20:41:36 -03:00
oobabooga f074ffc31b UI: minor light theme improvement 2025-02-02 15:39:36 -08:00
oobabooga c6f2c2fd7e UI: style improvements 2025-02-02 15:34:03 -08:00
oobabooga 0360f54ae8 UI: add a "Show after" parameter (to use with DeepSeek </think>) 2025-02-02 15:30:09 -08:00
oobabooga 01c46f8b56 Merge branch 'main' into dev 2025-01-30 09:49:30 -08:00
oobabooga 32cdaa540f Update README 2025-01-30 09:49:25 -08:00
oobabooga 461d1fdb76 Update README 2025-01-30 09:48:52 -08:00
SpyTech Labs fea98f82c5
DOCS FIX: WSL Port Forwarding Loop. (#6519) 2025-01-30 14:34:23 -03:00
oobabooga 9ac4d81c8b
Merge pull request #6713 from oobabooga/dev
Merge dev branch
2025-01-29 19:12:56 -03:00
oobabooga b614ea6596 Installer: small fixes 2025-01-29 14:05:39 -08:00
oobabooga f01cc079b9 Lint 2025-01-29 14:00:59 -08:00
oobabooga b7c17727b0 Update .gitignore 2025-01-29 13:57:56 -08:00
oobabooga 9ddcc91a91 Bump llama-cpp-python to 0.3.7 2025-01-29 13:56:46 -08:00
oobabooga e3fd4a0ea7 Merge branch 'main' into dev 2025-01-28 12:54:57 -08:00
oobabooga a1c353a4b3 Update README 2025-01-28 12:54:25 -08:00
oobabooga 3936589755 Update README 2025-01-28 12:53:55 -08:00
oobabooga 0b9ab1438d Clean up 2025-01-27 10:28:59 -08:00
oobabooga bac652bb1d Another fix 2025-01-27 10:25:26 -08:00
oobabooga 340022d4b0 Fix after previous commit 2025-01-27 10:02:21 -08:00
oobabooga 053911b629 Installer: don't ignore .whl requirements if the commit has changed
By the user manually switching branches or calling git pull.
2025-01-27 09:24:44 -08:00
oobabooga 1c9dfa871b Revert "Installer: change a message"
This reverts commit c49251e95d.
2025-01-26 18:17:31 -08:00
oobabooga 87de91dd65 Docs: fix an API example 2025-01-25 18:29:11 -08:00
oobabooga c49251e95d Installer: change a message 2025-01-25 15:03:09 -08:00
oobabooga 75ff3f3815 UI: Mention common context length values 2025-01-25 08:22:23 -08:00
oobabooga 3d4f3e423c Downloader: Make progress bars not jump around
Adapted from: https://gist.github.com/NiklasBeierl/13096bfdd8b2084da8c1163dd06f91d3
2025-01-25 07:44:24 -08:00
FP HAM 71a551a622
Add strftime_now to JINJA to sattisfy LLAMA 3.1 and 3.2 (and granite) (#6692) 2025-01-24 11:37:20 -03:00
FP HAM 5d6f3e6f92
Training pro- removed monkeypatch references (#6695) 2025-01-24 11:23:44 -03:00
oobabooga 0485ff20e8 Workaround for convert_to_markdown bug 2025-01-23 06:21:40 -08:00
oobabooga 7f8c1c1f07 Docs: update the API examples 2025-01-22 08:48:02 -08:00
Shay Molcho b76b7f6bf5
Minor README change (#6687) 2025-01-22 12:02:43 -03:00
FP HAM 4bd260c60d
Give SillyTavern a bit of leaway the way the do OpenAI (#6685) 2025-01-22 12:01:44 -03:00
oobabooga b56eb0b9cd Merge branch 'main' into dev 2025-01-22 06:44:22 -08:00
oobabooga 39799adc47 Add a helpful error message when llama.cpp fails to load the model 2025-01-21 12:49:12 -08:00
oobabooga 079ace63ec Installer: minor change 2025-01-21 10:14:05 -08:00
oobabooga 41f4fee085 Lint 2025-01-21 10:01:52 -08:00
oobabooga ff250dd800 Installer: simplify the script 2025-01-21 09:58:13 -08:00
oobabooga 2bf8788c30 Installer: Fix a bug after ecb5d3c485 2025-01-21 09:35:22 -08:00
oobabooga 5e99dded4e UI: add "Continue" and "Remove" buttons below the last chat message 2025-01-21 09:05:44 -08:00
oobabooga ecb5d3c485 Installer: do not redownload wheels for each update 2025-01-21 08:45:13 -08:00
dependabot[bot] f8a5b0bc43
Update accelerate requirement from ==1.2.* to ==1.3.* (#6683) 2025-01-20 17:41:03 -03:00
oobabooga 096272f49e Update README 2025-01-17 09:47:45 -08:00
oobabooga c32f06d62f Update README 2025-01-17 07:03:22 -08:00
oobabooga 878f378e9f
Merge pull request #6670 from oobabooga/dev
Merge dev branch
2025-01-16 10:22:49 -03:00
oobabooga 0258a6f877 Fix the Google Colab notebook 2025-01-16 05:21:18 -08:00
oobabooga fe96678692 Update some comments in the requirements 2025-01-14 19:28:48 -08:00
oobabooga ddb0f71741
Merge pull request #6666 from oobabooga/dev
Merge dev branch
2025-01-14 22:24:39 -03:00
oobabooga 2344366c9b Remove a debug message 2025-01-14 17:23:44 -08:00
oobabooga 7e80266ae9
Merge pull request #6665 from oobabooga/dev
Merge dev branch
2025-01-14 22:01:08 -03:00
oobabooga 5d25739767 Make the update wizards nice 2025-01-14 16:59:36 -08:00
oobabooga 1ef748fb20 Lint 2025-01-14 16:44:15 -08:00
oobabooga f843cb475b UI: update a help message 2025-01-14 08:12:51 -08:00
oobabooga c832953ff7 UI: Activate auto_max_new_tokens by default 2025-01-14 05:59:55 -08:00
Underscore 53b838d6c5
HTML: Fix quote pair RegEx matching for all quote types (#6661) 2025-01-13 18:01:50 -03:00
oobabooga c85e5e58d0 UI: move the new morphdom code to a .js file 2025-01-13 06:20:42 -08:00
oobabooga facb4155d4 Fix morphdom leaving ghost elements behind 2025-01-11 20:57:28 -08:00
Lounger ed16374ece
Fix the gallery extension (#6656) 2025-01-11 23:35:22 -03:00
oobabooga a0492ce325
Optimize syntax highlighting during chat streaming (#6655) 2025-01-11 21:14:10 -03:00
mamei16 f1797f4323
Unescape backslashes in html_output (#6648) 2025-01-11 18:39:44 -03:00
oobabooga 1b9121e5b8 Add a "refresh" button below the last message, add a missing file 2025-01-11 12:42:25 -08:00
oobabooga a5d64b586d
Add a "copy" button below each message (#6654) 2025-01-11 16:59:21 -03:00
oobabooga 58342740a5 Bump flash-attn to 2.7.3 2025-01-11 07:59:49 -08:00
oobabooga 3a722a36c8
Use morphdom to make chat streaming 1902381098231% faster (#6653) 2025-01-11 12:55:19 -03:00
oobabooga 02db4b0d06 Bump transformers to 4.48 2025-01-10 15:05:08 -08:00
oobabooga d2f6c0f65f Update README 2025-01-10 13:25:40 -08:00
oobabooga c393f7650d Update settings-template.yaml, organize modules/shared.py 2025-01-10 13:22:18 -08:00
oobabooga 83c426e96b
Organize internals (#6646) 2025-01-10 18:04:32 -03:00
oobabooga 17aa97248f Installer: make the hashsum verification more robust on Windows 2025-01-10 07:22:25 -08:00
oobabooga 7fe46764fb Improve the --help message about --tensorcores as well 2025-01-10 07:07:41 -08:00
oobabooga da6d868f58 Remove old deprecated flags (~6 months or more) 2025-01-09 16:11:46 -08:00
oobabooga 15bfe36619 Installer: update miniconda to 24.11.1 (experimental) 2025-01-09 15:58:14 -08:00
oobabooga e6eda6a3bb
Merge pull request #6645 from oobabooga/dev
Merge dev branch
2025-01-09 18:46:28 -03:00
oobabooga f3c0f964a2 Lint 2025-01-09 13:18:23 -08:00
oobabooga 0e94d7075e UI: minor style fix on Windows 2025-01-09 13:12:30 -08:00
oobabooga 3020f2e5ec UI: improve the info message about --tensorcores 2025-01-09 12:44:03 -08:00
oobabooga c08d87b78d Make the huggingface loader more readable 2025-01-09 12:23:38 -08:00
oobabooga 03b4067f31 Installer: ask 1 question for NVIDIA users instead of 2 2025-01-09 12:03:49 -08:00
BPplays 619265b32c
add ipv6 support to the API (#6559) 2025-01-09 10:23:44 -03:00
oobabooga 5c89068168 UI: add an info message for the new Static KV cache option 2025-01-08 17:36:30 -08:00
oobabooga 4ffc9ffc7a UI: fix a list style 2025-01-08 17:24:38 -08:00
oobabooga e6796c3859 Bump llama-cpp-python to 0.3.6, add macOS 14 and 15 wheels 2025-01-08 17:24:21 -08:00
nclok1405 b9e2ded6d4
Added UnicodeDecodeError workaround for modules/llamacpp_model.py (#6040)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2025-01-08 21:17:31 -03:00
oobabooga 91a8a87887 Remove obsolete code 2025-01-08 15:07:21 -08:00
oobabooga ad118056b8 Update README 2025-01-08 14:29:46 -08:00
oobabooga 7157257c3f
Remove the AutoGPTQ loader (#6641) 2025-01-08 19:28:56 -03:00
Jack Cloudman d3adcbf64b
Add --exclude-pattern flag to download-model.py script (#6542) 2025-01-08 17:30:21 -03:00
dependabot[bot] 1f86722977
Update safetensors requirement from ==0.4.* to ==0.5.* (#6634) 2025-01-08 16:56:55 -03:00
FP HAM 03a0f236a4
Training_PRO fix: add if 'quantization_config' in shared.model.config.to_dict() 2025-01-08 16:54:09 -03:00
oobabooga c0f600c887 Add a --torch-compile flag for transformers 2025-01-05 05:47:00 -08:00
oobabooga 11af199aff Add a "Static KV cache" option for transformers 2025-01-04 17:52:57 -08:00
oobabooga 3967520e71 Connect XTC, DRY, smoothing_factor, and dynatemp to ExLlamaV2 loader (non-HF) 2025-01-04 16:25:06 -08:00
oobabooga d56b500568 UI: add padding to file saving dialog 2025-01-04 16:22:40 -08:00
oobabooga 049297fa66 UI: reduce the size of CSS sent to the UI during streaming 2025-01-04 14:09:36 -08:00
oobabooga 0e673a7a42 UI: reduce the size of HTML sent to the UI during streaming 2025-01-04 11:40:24 -08:00
mamei16 9f24885bd2
Sane handling of markdown lists (#6626) 2025-01-04 15:41:31 -03:00
oobabooga 3815f46838 UI: minor style improvements to chat tab 2025-01-03 04:35:29 -08:00
oobabooga e2702200e1 UI: fix the font size of lists in chat mode 2025-01-02 19:26:50 -08:00
oobabooga 4b3e1b3757 UI: add a "Search chats" input field 2025-01-02 18:46:40 -08:00
oobabooga b8fc9010fa UI: fix orjson.JSONDecodeError error on page reload 2025-01-02 16:57:04 -08:00
oobabooga 973255cb0b UI: fix codeblocks overflowing on mobile 2025-01-02 16:48:49 -08:00
oobabooga 75f1b5ccde UI: add a "Branch chat" button 2025-01-02 16:24:18 -08:00
Petr Korolev 13c033c745
Fix CUDA error on MPS backend during API request (#6572)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2025-01-02 00:06:11 -03:00
oobabooga 979e1f1bd6 Fix a bug after 9163951f3a 2025-01-01 17:57:09 -08:00
oobabooga f011787a83 UI: make codeblocks scroll horizontally on overflow 2025-01-01 17:55:18 -08:00
oobabooga 9163951f3a UI: reduce the CPU usage during text streaming 2025-01-01 17:49:57 -08:00
oobabooga 725639118a UI: Use a tab length of 2 for lists (rather than 4) 2025-01-01 13:53:50 -08:00
oobabooga 7b88724711
Make responses start faster by removing unnecessary cleanup calls (#6625) 2025-01-01 18:33:38 -03:00
oobabooga 88a6331abf
Merge pull request #6623 from oobabooga/dev
Merge dev branch
2024-12-31 20:47:48 -03:00
oobabooga 64853f8509 Reapply a necessary change that I removed from #6599 (thanks @mamei16!) 2024-12-31 14:43:22 -08:00
mamei16 e953af85cd
Fix newlines in the markdown renderer (#6599)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2024-12-31 01:04:02 -03:00
dependabot[bot] d24b83132b
Bump jinja2 from 3.1.4 to 3.1.5 (#6601) 2024-12-30 09:35:20 -03:00
mamei16 cca4ac56fa
Fix interface loading with dark theme even when 'dark_theme' is set to false (#6614) 2024-12-30 09:34:19 -03:00
oobabooga 292cd489e9 Bump ExLlamaV2 to 0.2.7 2024-12-30 04:31:10 -08:00
oobabooga 4ce9d13dbe
Preset cleanup (#6619) 2024-12-29 12:25:26 -03:00
oobabooga 39a5c9a49c
UI organization (#6618) 2024-12-29 11:16:17 -03:00
oobabooga 0490ee620a UI: increase the threshold for a <li> to be considered long (some more) 2024-12-19 16:51:34 -08:00
oobabooga ee3a533e5c UI: improve the message width in instruct mode 2024-12-19 16:11:29 -08:00
oobabooga 89888bef56 UI: increase the threshold for a <li> to be considered long 2024-12-19 14:38:36 -08:00
oobabooga 2acec386fc UI: improve the streaming cursor 2024-12-19 14:08:56 -08:00
oobabooga e2fb86e5df UI: further improve the style of lists and headings 2024-12-19 13:59:24 -08:00
oobabooga c8ddb86c22 UI: improve some light mode colors 2024-12-19 12:24:04 -08:00
oobabooga 24a4c98d42 UI: improve the style of links in messages 2024-12-19 12:23:03 -08:00
oobabooga 836a868abc UI: improve the heading fonts 2024-12-19 12:21:28 -08:00
oobabooga 4d466d5c80
Merge pull request #6585 from oobabooga/dev
Merge dev branch
2024-12-18 23:24:55 -03:00
oobabooga fee23df1a5 Update README.md 2024-12-18 18:13:01 -08:00
oobabooga 9fd12605ac Update README.md 2024-12-18 17:58:53 -08:00
oobabooga 228caf0f3c UI: add a scrollbar to the right sidebar 2024-12-18 15:33:05 -08:00
oobabooga d01dd2e1c8 UI: fix a margin 2024-12-18 13:35:40 -08:00
Aluísio Pires 2bea4dfa96
Fix an issue caused during the installation of tts (#6496) 2024-12-18 18:16:56 -03:00
oobabooga 0a15cff6a0 UI: close sidebars by clicking outside their areas on mobile 2024-12-18 12:27:06 -08:00
oobabooga 636a6621cc UI: fix sidebars closing when typing on mobile 2024-12-18 12:16:59 -08:00
oobabooga 0c069e5b3f UI: remove obsolete js event 2024-12-18 12:16:26 -08:00
oobabooga c48e4622e8 UI: update a link 2024-12-18 06:28:14 -08:00
oobabooga b27f6f8915 Lint 2024-12-17 20:13:32 -08:00
oobabooga e83235a0cc UI: fix a font color 2024-12-17 20:11:51 -08:00
oobabooga ac0f60eb1a UI: make dropdown menus more readable 2024-12-17 20:02:04 -08:00
oobabooga b051e2c161 UI: improve a margin for readability 2024-12-17 19:58:21 -08:00
oobabooga 60c93e0c66 UI: Set cache_type to fp16 by default 2024-12-17 19:44:20 -08:00
oobabooga ddccc0d657 UI: minor change to log messages 2024-12-17 19:39:00 -08:00
oobabooga 3030c79e8c UI: show progress while loading a model 2024-12-17 19:37:43 -08:00
Diner Burger addad3c63e
Allow more granular KV cache settings (#6561) 2024-12-17 17:43:48 -03:00
oobabooga c43ee5db11 UI: very minor color change 2024-12-17 07:59:55 -08:00
oobabooga 517fcc1f23 Better centralize the chat tab 2024-12-16 20:12:16 -08:00
oobabooga d769618591
Improved UI (#6575) 2024-12-17 00:47:41 -03:00
dependabot[bot] dc56fcff12
Update bitsandbytes requirement from ==0.44.* to ==0.45.* (#6584) 2024-12-16 19:48:51 -03:00
dependabot[bot] 25c640ec0c
Update accelerate requirement from ==1.1.* to ==1.2.* (#6583) 2024-12-16 18:58:50 -03:00
oobabooga 97f5615661 Bump llama-cpp-python to 0.3.5, remove macos 12 wheels (workflow is failing) 2024-12-11 07:14:59 -08:00
oobabooga 27398428f6 Bump flash-attention to v2.7.2.post1 2024-12-09 10:17:17 -08:00
oobabooga baa566b0c6 Bump exllamav2 to 0.2.6 2024-12-09 10:16:33 -08:00
oobabooga f7836c4bd8 Bump transformers to 4.47 2024-12-09 07:00:15 -08:00
oobabooga aa629e2809 Bump exllamav2 to 0.2.5 2024-12-01 12:00:28 -08:00
oobabooga 350758f81c UI: Fix the history upload event 2024-11-19 20:34:53 -08:00
oobabooga d01293861b Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-11-18 10:15:36 -08:00
oobabooga 3d19746a5d UI: improve HTML rendering for lists with sub-lists 2024-11-18 10:14:09 -08:00
mefich 1c937dad72
Filter whitespaces in downloader fields in model tab (#6518) 2024-11-18 12:01:40 -03:00
dependabot[bot] f93196e306
Update accelerate requirement from ==1.0.* to ==1.1.* (#6515) 2024-11-18 12:00:24 -03:00
hronoas 9b3a3d8f12
openai extension fix: Handle Multiple Content Items in Messages (#6528) 2024-11-18 11:59:52 -03:00
oobabooga 5fa9336dab Bump flash-attention to 2.7.0.post2 2024-11-18 06:55:29 -08:00
oobabooga 0c48ecf359 Bump exllamav2 to 0.2.4 2024-11-18 06:51:56 -08:00
oobabooga 8d5cf7b134 Bump llama-cpp-python to 0.3.2 2024-11-18 06:51:06 -08:00
oobabooga cc8c7ed209
Merge pull request #6491 from oobabooga/dev
Merge dev branch
2024-10-25 01:10:23 -03:00
oobabooga 3a92fa517b Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-10-24 11:26:21 -07:00
oobabooga 8deea2936d Remove lm_eval from requirements 2024-10-24 11:25:42 -07:00
PIRI e1061ba7e3
Make token bans work again on HF loaders (#6488) 2024-10-24 15:24:02 -03:00
oobabooga b50dc3bf57 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-10-24 11:22:54 -07:00
oobabooga 386c0d8289 Bump transformers to 4.46 2024-10-24 11:09:09 -07:00
Paul Richardson 6a0837451e
Minor Documentation update - query cuda compute for docker .env (#6469) 2024-10-15 10:39:00 -03:00
Molly Sophia 18f836b280
Add RWKV-World instruction template (#6456) 2024-10-14 17:51:20 -03:00
dependabot[bot] e784938654
Update accelerate requirement from ==0.33.* to ==1.0.* (#6441) 2024-10-14 17:32:53 -03:00
oobabooga f1a8eae04d Remove optimum from requirements 2024-10-14 13:30:45 -07:00
oobabooga 2468cfd8bb Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-10-14 13:25:27 -07:00
oobabooga bb62e796eb Fix locally compiled llama-cpp-python failing to import 2024-10-14 13:24:13 -07:00
oobabooga c9a9f63d1b Fix llama.cpp loader not being random (thanks @reydeljuego12345) 2024-10-14 13:07:07 -07:00
PIRI 03a2e70054
Fix temperature_last when temperature not in sampler priority (#6439) 2024-10-09 11:25:14 -03:00
Grzegorz Lippe 9d8b1c5fd9
Fix intel bug described in #6253 (#6433) 2024-10-05 11:58:17 -03:00
Luana 22baa5378f
Fix for systems that have bash in a non-standard directory (#6428) 2024-10-03 00:35:13 -03:00
SeanScripts e1338a1804
Add whisper turbo (#6423) 2024-10-01 17:49:35 -03:00
oobabooga d1af7a41ad
Merge pull request #6422 from oobabooga/dev
Merge dev branch
2024-10-01 15:21:53 -03:00
oobabooga 49dfa0adaf Fix the "save preset" event 2024-10-01 11:20:48 -07:00
oobabooga 93c250b9b6 Add a UI element for enable_tp 2024-10-01 11:16:15 -07:00
oobabooga 3b06cb4523
Merge pull request #6421 from oobabooga/dev
Merge dev branch
2024-10-01 14:48:41 -03:00
oobabooga d364aa0a3c Lint 2024-10-01 10:22:57 -07:00
oobabooga cca9d6e22d Lint 2024-10-01 10:21:06 -07:00
oobabooga c6b50f88da Lint 2024-10-01 10:19:28 -07:00
oobabooga 7cb98351da
Merge branch 'main' into dev 2024-10-01 14:18:32 -03:00
oobabooga 617cd7b705 Revert "Update accelerate requirement from ==0.33.* to ==0.34.* (#6416)"
This reverts commit 6063a66414.
2024-10-01 09:06:25 -07:00
dependabot[bot] 6063a66414
Update accelerate requirement from ==0.33.* to ==0.34.* (#6416) 2024-09-30 18:50:38 -03:00
oobabooga 4d9ce586d3 Update llama_cpp_python_hijack.py, fix llamacpp_hf 2024-09-30 14:49:21 -07:00
oobabooga 9ca0cd7749 Bump llama-cpp-python to 0.3.1 2024-09-29 20:47:04 -07:00
oobabooga bbdeed3cf4 Make sampler priority high if unspecified 2024-09-29 20:45:27 -07:00
oobabooga 01362681f2 Bump exllamav2 to 0.2.4 2024-09-29 07:42:44 -07:00
Hanusz Leszek e4b0467f9f
Add beforeunload event to add confirmation dialog when leaving page (#6279) 2024-09-29 01:14:19 -03:00
Manuel Schmid 0f90a1b50f
Do not set value for histories in chat when --multi-user is used (#6317) 2024-09-29 01:08:55 -03:00
oobabooga 055f3f5632 Fix after #6386 (thanks @Touch-Night) 2024-09-28 20:55:26 -07:00
oobabooga 57160cd6fa Update README 2024-09-28 20:50:41 -07:00
oobabooga 3f0571b62b Update README 2024-09-28 20:48:30 -07:00
oobabooga 3fb02f43f6 Update README 2024-09-28 20:38:43 -07:00
oobabooga 3b99532e02 Remove HQQ and AQLM from requirements 2024-09-28 20:34:59 -07:00
oobabooga c61b29b9ce Simplify the warning when flash-attn fails to import 2024-09-28 20:33:17 -07:00
oobabooga b92d7fd43e Add warnings for when AutoGPTQ, TensorRT-LLM, or HQQ are missing 2024-09-28 20:30:24 -07:00
oobabooga 65e5864084 Update README 2024-09-28 20:25:26 -07:00
oobabooga 1a870b3ea7 Remove AutoAWQ and AutoGPTQ from requirements (no wheels available) 2024-09-28 19:38:56 -07:00
oobabooga 85994e3ef0 Bump pytorch to 2.4.1 2024-09-28 09:44:08 -07:00
oobabooga ca5a2dba72 Bump rocm to 6.1.2 2024-09-28 09:39:53 -07:00
oobabooga 7276dca933 Fix a typo 2024-09-27 20:28:17 -07:00
RandoInternetPreson 46996f6519
ExllamaV2 tensor parallelism to increase multi gpu inference speeds (#6356) 2024-09-28 00:26:03 -03:00
Philipp Emanuel Weidmann 301375834e
Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition (#6335) 2024-09-27 22:50:12 -03:00
oobabooga 3492e33fd5 Bump bitsandbytes to 0.44 2024-09-27 16:59:30 -07:00
Thireus ☠ 626b0a0437
Force /bin/bash shell for conda (#6386) 2024-09-27 19:47:04 -03:00
oobabooga 5c918c5b2d Make it possible to sort DRY 2024-09-27 15:40:48 -07:00
oobabooga 78b8705400 Bump llama-cpp-python to 0.3.0 (except for AMD) 2024-09-27 15:06:31 -07:00
oobabooga c5f048e912 Bump ExLlamaV2 to 0.2.2 2024-09-27 15:04:08 -07:00
oobabooga 7424f789bf
Fix the sampling monkey patch (and add more options to sampler_priority) (#6411) 2024-09-27 19:03:25 -03:00
oobabooga c497a32372 Bump transformers to 4.45 2024-09-26 11:55:51 -07:00
oobabooga f98431c744 Apply the change to all requirements (oops) 2024-09-06 18:48:13 -07:00
oobabooga a50477ec85 Apply the change to all requirements (oops) 2024-09-06 18:47:25 -07:00
oobabooga ac30b004ef Pin fastapi/pydantic requirement versions 2024-09-06 18:45:15 -07:00
oobabooga e86ab37aaf Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-09-06 18:44:43 -07:00
oobabooga 27797a92d0 Pin fastapi/pydantic requirement versions 2024-09-06 18:38:57 -07:00
Jean-Sylvain Boige 4924ee2901
typo in OpenAI response format (#6365) 2024-09-05 21:42:23 -03:00
oobabooga bba5b36d33 Don't import PEFT unless necessary 2024-09-03 19:40:53 -07:00
oobabooga c5b40eb555 llama.cpp: prevent prompt evaluation progress bar with just 1 step 2024-09-03 17:37:06 -07:00
oobabooga 2cb8d4c96e Bump llama-cpp-python to 0.2.90 2024-09-03 05:53:18 -07:00
oobabooga 64919e0d69 Bump flash-attention to 2.6.3 2024-09-03 05:51:46 -07:00
oobabooga 68d52c60f3 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-09-02 21:16:39 -07:00
oobabooga d1168afa76 Bump ExLlamaV2 to 0.2.0 2024-09-02 21:15:51 -07:00
Stefan Merettig 9a150c3368
API: Relax multimodal format, fixes HuggingFace Chat UI (#6353) 2024-09-02 23:03:15 -03:00
GralchemOz 4c74c7a116
Fix UnicodeDecodeError for BPE-based Models (especially GLM-4) (#6357) 2024-09-02 23:00:59 -03:00
FartyPants (FP HAM) 41a8eb4eeb
Training pro update script.py (#6359) 2024-09-02 23:00:15 -03:00
oobabooga 1f288b4072 Bump ExLlamaV2 to 0.1.9 2024-08-22 12:40:15 -07:00
joachimchauvet c24966c591
update API documentation with examples to list/load models (#5902) 2024-08-21 15:33:45 -03:00
oobabooga 5522584992
Merge pull request #6339 from oobabooga/dev
Merge dev branch
2024-08-20 11:20:52 -03:00
oobabooga 1124f71cf3
Update README.md 2024-08-20 11:19:46 -03:00
oobabooga 1b62cd8508
Merge pull request #6337 from oobabooga/dev
Merge dev branch
2024-08-20 01:54:47 -03:00
oobabooga d9a031fcad
Update README.md 2024-08-20 01:52:30 -03:00
oobabooga 073694bf15
Merge pull request #6336 from oobabooga/dev
Merge dev branch
2024-08-20 01:27:58 -03:00
oobabooga 9d99156ca3
Update README.md 2024-08-20 01:27:02 -03:00
oobabooga 406995f722 Update README 2024-08-19 21:24:01 -07:00
oobabooga 1b1518aa6a
Update README.md 2024-08-20 00:36:18 -03:00
oobabooga 5058269143 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-08-19 19:55:45 -07:00
oobabooga fd9cb26619 UI: update the DRY parameters descriptions/order 2024-08-19 19:40:17 -07:00
dependabot[bot] 64e16e9a46
Update accelerate requirement from ==0.32.* to ==0.33.* (#6291) 2024-08-19 23:34:10 -03:00
dependabot[bot] 68f928b5e0
Update peft requirement from ==0.8.* to ==0.12.* (#6292) 2024-08-19 23:33:56 -03:00
oobabooga 8bac1a9382
Update README.md 2024-08-19 23:10:04 -03:00
oobabooga bb987ffe66
Update README.md 2024-08-19 23:06:52 -03:00
oobabooga 4d8c1801c2 Bump llama-cpp-python to 0.2.89 2024-08-19 17:45:01 -07:00
oobabooga bf8187124d Bump llama-cpp-python to 0.2.88 2024-08-13 12:40:18 -07:00
oobabooga 089d5a9415 Bump llama-cpp-python to 0.2.87 2024-08-07 20:36:28 -07:00
oobabooga 81773f7f36 Bump transformers to 4.44 2024-08-06 20:07:05 -07:00
oobabooga e926c03b3d Add a --tokenizer-dir command-line flag for llamacpp_HF 2024-08-06 19:41:18 -07:00
oobabooga f106e780ba downloader: use 1 session for all files for better speed 2024-08-06 19:41:12 -07:00
oobabooga d011040f43
Merge pull request #6300 from oobabooga/dev
Merge dev branch
2024-08-01 02:26:12 -03:00
oobabooga 608545d282 Bump llama-cpp-python to 0.2.85 2024-07-31 18:44:46 -07:00
oobabooga 30b4d8c8b2 Fix Llama 3.1 template including lengthy "tools" headers 2024-07-29 11:52:17 -07:00
oobabooga f4d95f33b8 downloader: better progress bar 2024-07-28 22:21:56 -07:00
oobabooga 9dcff21da9 Remove unnecessary shared.previous_model_name variable 2024-07-28 18:35:11 -07:00
oobabooga addcb52c56 Make --idle-timeout work for API requests 2024-07-28 18:31:40 -07:00
oobabooga 514fb2e451 Fix UI error caused by --idle-timeout 2024-07-28 18:30:06 -07:00
oobabooga 3aa646c1d0 UI: improve the style of headers in chat messages 2024-07-28 15:26:15 -07:00
oobabooga 92ab3a9a6a Bump llama-cpp-python to 0.2.84 2024-07-28 15:13:06 -07:00
oobabooga 5223c009fe Minor change after previous commit 2024-07-27 23:13:34 -07:00
oobabooga 7050bb880e UI: make n_ctx/max_seq_len/truncation_length numbers rather than sliders 2024-07-27 23:11:53 -07:00
Harry 078e8c8969
Make compress_pos_emb float (#6276) 2024-07-28 03:03:19 -03:00
oobabooga ffc713f72b UI: fix multiline LaTeX equations 2024-07-27 15:36:10 -07:00
oobabooga 493f8c3242 UI: remove animation after clicking on "Stop" in the Chat tab 2024-07-27 15:22:34 -07:00
oobabooga e4d411b841 UI: fix rendering LaTeX enclosed between \[ and \] 2024-07-27 15:21:44 -07:00
oobabooga 6bab4c2faa UI: add back single $ for equations 2024-07-26 23:03:53 -07:00
oobabooga f32d26240d UI: Fix the chat "stop" event 2024-07-26 23:03:05 -07:00
oobabooga 9e82f8c394 UI: Fix chat sometimes not scrolling down after sending a message 2024-07-26 22:35:30 -07:00
oobabooga c5814db173 UI: fix double quotes in instruct mode 2024-07-25 20:22:07 -07:00
oobabooga 498fec2c7c UI: fix saving characters 2024-07-25 15:11:27 -07:00
oobabooga b80d5906c2 UI: fix saving characters 2024-07-25 15:09:31 -07:00
oobabooga dd97a83534
Merge pull request #6271 from oobabooga/dev
Merge dev branch
2024-07-25 12:12:04 -03:00
oobabooga e4624fbc68
Merge branch 'main' into dev 2024-07-25 12:03:45 -03:00
oobabooga 42e80108f5 UI: clear the markdown LRU cache when using the default/notebook tabs 2024-07-25 08:01:42 -07:00
oobabooga a34273755b Revert "Updater: don't reinstall requirements if no updates after git pull"
This reverts commit ac30e7fe9c.
2024-07-25 07:34:01 -07:00
oobabooga d581334a41 Don't install AutoAWQ on CUDA 11.8 2024-07-25 05:38:52 -07:00
oobabooga 14584fda36 UI: don't change the color of italics in instruct mode 2024-07-24 20:55:18 -07:00
oobabooga b85ae6bc96 Fix after previous commit 2024-07-24 19:10:17 -07:00
oobabooga b6830bcdae Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-07-24 19:04:38 -07:00
oobabooga ac30e7fe9c Updater: don't reinstall requirements if no updates after git pull 2024-07-24 19:03:34 -07:00
oobabooga 1f101ee3e5 UI: improve the quote colors 2024-07-24 18:56:54 -07:00
Luana 3170b6efc9
Fixes Linux shebangs (#6110) 2024-07-24 22:23:29 -03:00
oobabooga 7e2851e505 UI: fix "Command for chat-instruct mode" not appearing by default 2024-07-24 15:04:12 -07:00
oobabooga 947016d010 UI: make the markdown LRU cache infinite (for really long conversations) 2024-07-24 11:54:26 -07:00
oobabooga 3b2c23dfb5 Add AutoAWQ 0.2.6 wheels for PyTorch 2.2.2 2024-07-24 11:15:00 -07:00
oobabooga 8a5f110c14 Bump ExLlamaV2 to 0.1.8 2024-07-24 09:22:48 -07:00
oobabooga e637b702ff UI: make text between quotes colored in chat mode 2024-07-23 21:30:32 -07:00
oobabooga 98ed6d3a66 Don't use flash attention on Google Colab 2024-07-23 19:50:56 -07:00
oobabooga af839d20ac Remove the AutoAWQ requirement 2024-07-23 19:38:39 -07:00
oobabooga 9d5513fda0 Remove the AutoAWQ requirement 2024-07-23 19:38:04 -07:00
oobabooga 8b52b93e85 Make the Google Colab notebook functional again (attempt) 2024-07-23 19:35:00 -07:00
oobabooga e777b73349 UI: prevent LaTeX from being rendered for inline "$" 2024-07-23 19:04:19 -07:00
oobabooga 1815877061 UI: fix the default character not loading correctly on startup 2024-07-23 18:48:10 -07:00
oobabooga e6181e834a Remove AutoAWQ as a standalone loader
(it works better through transformers)
2024-07-23 15:31:17 -07:00
oobabooga f66ab63d64 Bump transformers to 4.43 2024-07-23 14:06:34 -07:00
oobabooga 6b4d762120
Merge pull request #6261 from oobabooga/dev
Merge dev branch
2024-07-23 03:11:02 -03:00
oobabooga 95b3e98c36 UI: Fix code syntax highlighting 2024-07-22 23:08:48 -07:00
oobabooga d1115f18b9
Merge pull request #6260 from oobabooga/dev
Merge dev branch
2024-07-23 02:30:35 -03:00
oobabooga 3ee682208c Revert "Bump hqq from 0.1.7.post3 to 0.1.8 (#6238)"
This reverts commit 1c3671699c.
2024-07-22 19:53:56 -07:00
oobabooga 5e7f4ee88a UI: simplify the interface load events 2024-07-22 19:11:55 -07:00
oobabooga 5c5e7264ec Update README 2024-07-22 18:20:01 -07:00
oobabooga 7e73058943 UI: fix h1/h2/h3/h4 color in light mode 2024-07-22 18:18:02 -07:00
oobabooga f18c947a86 Update the tensorcores description 2024-07-22 18:06:41 -07:00
oobabooga aa809e420e Bump llama-cpp-python to 0.2.83, add back tensorcore wheels
Also add back the progress bar patch
2024-07-22 18:05:11 -07:00
oobabooga 11bbf71aa5
Bump back llama-cpp-python (#6257) 2024-07-22 16:19:41 -03:00
oobabooga 0f53a736c1 Revert the llama-cpp-python update 2024-07-22 12:02:25 -07:00
oobabooga a687f950ba Remove the tensorcores llama.cpp wheels
They are not faster than the default wheels anymore and they use a lot of space.
2024-07-22 11:54:35 -07:00
oobabooga 017d2332ea Remove no longer necessary llama-cpp-python patch 2024-07-22 11:50:36 -07:00
oobabooga 7d2449f8b0 Bump llama-cpp-python to 0.2.82.3 (unofficial build) 2024-07-22 11:49:20 -07:00
oobabooga f2d802e707 UI: make Default/Notebook contents persist on page reload 2024-07-22 11:07:10 -07:00
oobabooga 8768b69a2d Lint 2024-07-21 22:08:14 -07:00
oobabooga 79e8dbe45f UI: minor optimization 2024-07-21 22:06:49 -07:00
oobabooga e1085180cf UI: better handle scrolling when the input area grows 2024-07-21 21:20:22 -07:00
oobabooga 7ef2414357 UI: Make the file saving dialogs more robust 2024-07-21 15:38:20 -07:00
oobabooga 423372d6e7 Organize ui_file_saving.py 2024-07-21 13:23:18 -07:00
oobabooga af99e0697e UI: increase the font weight of chat messages 2024-07-21 10:45:27 -07:00
oobabooga 17df2d7bdf UI: don't export the instruction template on "Save UI defaults to settings.yaml" 2024-07-21 10:45:01 -07:00
oobabooga d05846eae5 UI: refresh the pfp cache on handle_your_picture_change 2024-07-21 10:17:22 -07:00
oobabooga 58a1581b96 Add missing dark_theme.js (oops) 2024-07-21 09:47:55 -07:00
oobabooga e9d4bff7d0 Update the --tensor_split description 2024-07-20 22:04:48 -07:00
oobabooga 916d1d8283 UI: improve the style of code blocks in light theme 2024-07-20 20:32:57 -07:00
Patrick Leiser 9b205f94a4
Fix for issue #6024, don't auto-hide the chat contents (#6247) 2024-07-21 00:05:28 -03:00
oobabooga 564d8c8c0d Make alpha_value a float number 2024-07-20 20:02:54 -07:00
oobabooga 79c4d3da3d
Optimize the UI (#6251) 2024-07-21 00:01:42 -03:00
Alberto Cano a14c510afb
Customize the subpath for gradio, use with reverse proxy (#5106) 2024-07-20 19:10:39 -03:00
FartyPants (FP HAM) 6ab477f375
training: Added ChatML-format.json format example (#5899) 2024-07-20 19:05:09 -03:00
Vhallo a9a6d72d8c
Use gr.Number for RoPE scaling parameters (#6233)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-07-20 18:57:09 -03:00
dependabot[bot] 1c3671699c
Bump hqq from 0.1.7.post3 to 0.1.8 (#6238) 2024-07-20 18:20:26 -03:00
oobabooga aa7c14a463 Use chat-instruct mode by default 2024-07-19 21:43:52 -07:00
oobabooga 0315122cf0
Merge pull request #6232 from oobabooga/dev
Merge dev branch
2024-07-13 14:52:34 -03:00
oobabooga b19d239a60 Bump flash-attention to 2.6.1 2024-07-12 20:16:11 -07:00
InvectorGator 4148a9201f
Fix for MacOS users encountering model load errors (#6227)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
Co-authored-by: Invectorgator <Kudzu12gaming@outlook.com>
2024-07-13 00:04:19 -03:00
oobabooga d01c68f2a3
Merge pull request #6224 from oobabooga/dev
Merge dev branch
2024-07-11 20:42:46 -03:00
oobabooga 05676caf70 Update README 2024-07-11 16:25:52 -07:00
oobabooga f5599656b4 Update README 2024-07-11 16:22:00 -07:00
oobabooga d4eac58f2d Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-07-11 16:21:16 -07:00
oobabooga a30ec2e7db Update README 2024-07-11 16:20:44 -07:00
dependabot[bot] 063d2047dd
Update accelerate requirement from ==0.31.* to ==0.32.* (#6217) 2024-07-11 19:56:42 -03:00
oobabooga e436d69e2b Add --no_xformers and --no_sdpa flags for ExllamaV2 2024-07-11 15:47:37 -07:00
oobabooga 512b311137 Improve the llama-cpp-python exception messages 2024-07-11 13:00:29 -07:00
oobabooga 01e4721da7 Bump ExLlamaV2 to 0.1.7 2024-07-11 12:33:46 -07:00
oobabooga fa075e41f4 Bump llama-cpp-python to 0.2.82 2024-07-10 06:03:24 -07:00
oobabooga f957b17d18 UI: update an obsolete message 2024-07-10 06:01:36 -07:00
oobabooga c176244327 UI: Move cache_8bit/cache_4bit further up 2024-07-05 12:16:21 -07:00
oobabooga e813b322cf
Merge pull request #6203 from oobabooga/dev
Merge dev branch
2024-07-05 07:37:19 -03:00
oobabooga aa653e3b5a Prevent llama.cpp from being monkey patched more than once (closes #6201) 2024-07-05 03:34:15 -07:00
oobabooga a210e61df1 UI: Fix broken chat histories not showing (closes #6196) 2024-07-04 20:31:25 -07:00
oobabooga 3315d00651
Merge pull request #6200 from oobabooga/dev
Merge dev branch
2024-07-05 00:22:24 -03:00
oobabooga e79e7b90dc UI: Move the cache_8bit and cache_4bit elements up 2024-07-04 20:21:28 -07:00
oobabooga 363efe54f4
Merge pull request #6199 from oobabooga/dev
Merge dev branch
2024-07-05 00:17:14 -03:00
oobabooga 8b44d7b12a Lint 2024-07-04 20:16:44 -07:00
oobabooga a47de06088 Force only 1 llama-cpp-python version at a time for now 2024-07-04 19:43:34 -07:00
oobabooga f243b4ca9c Make llama-cpp-python not crash immediately 2024-07-04 19:16:00 -07:00
oobabooga f77cf159ba UI: fix a glitch when switching tabs with "show controls" unchecked 2024-07-02 20:57:03 -07:00
oobabooga 7e22eaa36c Bump llama-cpp-python to 0.2.81 2024-07-02 20:29:35 -07:00
oobabooga 907137a13d Automatically set bf16 & use_eager_attention for Gemma-2 2024-07-01 21:46:35 -07:00
TimStrauven 8074fba18d
Whisper stt overhaul js (#6194)
---------

Co-authored-by: RandoInternetPreson <aaronalai1@gmail.com>
2024-07-01 23:27:18 -03:00
GralchemOz 8a39f579d8
transformers: Add eager attention option to make Gemma-2 work properly (#6188) 2024-07-01 12:08:08 -03:00
oobabooga 19a56dd538 UI: Minor CSS improvement to chat mode 2024-06-30 21:09:54 -07:00
oobabooga 1ea3826333 UI: improve the chat area width on mobile devices 2024-06-30 17:08:23 -07:00
oobabooga ed01322763 Obtain the EOT token from the jinja template (attempt)
To use as a stopping string.
2024-06-30 15:09:22 -07:00
oobabooga 3e3f8637d6 Fix the AUTOMATIC1111 request in sd-api-pictures (closes #5993) 2024-06-29 11:43:57 -07:00
oobabooga 4ea260098f llama.cpp: add 4-bit/8-bit kv cache options 2024-06-29 09:10:33 -07:00
oobabooga 220c1797fc UI: do not show the "save character" button in the Chat tab 2024-06-28 22:11:31 -07:00
oobabooga f62aad3d59 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-06-28 21:42:03 -07:00
oobabooga 8803ae1845 UI: decrease the number of lines for "Command for chat-instruct mode" 2024-06-28 21:41:30 -07:00
mamei16 cc825dd1f4
Addressing Whisper STT issues (#5929) 2024-06-29 01:32:54 -03:00
oobabooga 5c6b9c610d
UI: allow the character dropdown to coexist in the Chat tab and the Parameters tab (#6177) 2024-06-29 01:20:27 -03:00
oobabooga de69a62004 Revert "UI: move "Character" dropdown to the main Chat tab"
This reverts commit 83534798b2.
2024-06-28 15:38:11 -07:00
oobabooga 38d58764db UI: remove unused gr.State variable from the Default tab 2024-06-28 15:17:44 -07:00
oobabooga 04cb197ed6 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-06-27 21:25:23 -07:00
oobabooga da196707cf UI: improve the light theme a bit 2024-06-27 21:05:38 -07:00
dependabot[bot] 9660f6f10e
Bump aqlm[cpu,gpu] from 1.1.5 to 1.1.6 (#6157) 2024-06-27 21:13:02 -03:00
dependabot[bot] a5df8f4e3c
Bump jinja2 from 3.1.2 to 3.1.4 (#6172) 2024-06-27 21:12:39 -03:00
dependabot[bot] c6cec0588c
Update accelerate requirement from ==0.30.* to ==0.31.* (#6156) 2024-06-27 21:12:02 -03:00
oobabooga 2f71515cb0 Make dependabot target the dev branch 2024-06-27 17:08:59 -07:00
oobabooga 1da47f2ae6 Make dependabot target the dev branch 2024-06-27 17:07:04 -07:00
oobabooga 9dbcb1aeea Small fix to make transformers 4.42 functional 2024-06-27 17:05:29 -07:00
oobabooga 66090758df Bump transformers to 4.42 (for gemma support) 2024-06-27 11:26:02 -07:00
oobabooga 6915c5077a
Merge pull request #6166 from oobabooga/dev
Merge dev branch
2024-06-26 23:33:09 -03:00
oobabooga 8ec8bc0b85 UI: handle another edge case while streaming lists 2024-06-26 18:40:43 -07:00
oobabooga 0e138e4be1 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-06-26 18:30:08 -07:00
mefich a85749dcbe
Update models_settings.py: add default alpha_value, add proper compress_pos_emb for newer GGUFs (#6111) 2024-06-26 22:17:56 -03:00
oobabooga 5fe532a5ce UI: remove DRY info text
It was visible for loaders without DRY.
2024-06-26 15:33:11 -07:00
oobabooga b1187fc9a5 UI: prevent flickering while streaming lists / bullet points 2024-06-25 19:19:45 -07:00
oobabooga 3691451d00
Add back the "Rename chat" feature (#6161) 2024-06-25 22:28:58 -03:00
oobabooga 53fbd2f245 Add TensorRT-LLM to the README 2024-06-25 14:45:37 -07:00
oobabooga ac3f92d36a UI: store chat history in the browser 2024-06-25 14:18:07 -07:00
oobabooga 46ca15cb79 Minor bug fixes after e7e1f5901e 2024-06-25 11:49:33 -07:00
oobabooga 83534798b2 UI: move "Character" dropdown to the main Chat tab 2024-06-25 11:25:57 -07:00
oobabooga 279cba607f UI: don't show an animation when updating the "past chats" menu 2024-06-25 11:10:17 -07:00
oobabooga 3290edfad9 Bug fix: force chat history to be loaded on launch 2024-06-25 11:06:05 -07:00
oobabooga e7e1f5901e
Prompts in the "past chats" menu (#6160) 2024-06-25 15:01:43 -03:00
oobabooga 602b455507 Bump llama-cpp-python to 0.2.79 2024-06-24 20:26:38 -07:00
oobabooga a43c210617
Improved past chats menu (#6158) 2024-06-25 00:07:22 -03:00
oobabooga 96ba53d916 Handle another fix after 57119c1b30 2024-06-24 15:51:12 -07:00
oobabooga 7db8b3b532 Bump ExLlamaV2 to 0.1.6 2024-06-24 05:38:11 -07:00
oobabooga 35f32d08bc GitHub: Increase the stalebot time to 6 months 2024-06-23 22:34:18 -07:00
oobabooga 564a3e1553 Remove the awkward "Tab" keyboard shortcut 2024-06-23 22:31:07 -07:00
oobabooga 577a8cd3ee
Add TensorRT-LLM support (#5715) 2024-06-24 02:30:03 -03:00
oobabooga 536f8d58d4 Do not expose alpha_value to llama.cpp & rope_freq_base to transformers
To avoid confusion
2024-06-23 22:09:24 -07:00
oobabooga b48ab482f8 Remove obsolete "gptq_for_llama_info" message 2024-06-23 22:05:19 -07:00
oobabooga 5e8dc56f8a Fix after previous commit 2024-06-23 21:58:28 -07:00
Louis Del Valle 57119c1b30
Update block_requests.py to resolve unexpected type error (500 error) (#5976) 2024-06-24 01:56:51 -03:00
oobabooga 125bb7b03b Revert "Bump llama-cpp-python to 0.2.78"
This reverts commit b6eaf7923e.
2024-06-23 19:54:28 -07:00
CharlesCNorton 5993904acf
Fix several typos in the codebase (#6151) 2024-06-22 21:40:25 -03:00
GodEmperor785 2c5a9eb597
Change limits of RoPE scaling sliders in UI (#6142) 2024-06-19 21:42:17 -03:00
oobabooga 5904142777 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-06-19 17:41:09 -07:00
oobabooga b10d735176 Minor CSS linting 2024-06-19 17:40:33 -07:00
Guanghua Lu 229d89ccfb
Make logs more readable, no more \u7f16\u7801 (#6127) 2024-06-15 23:00:13 -03:00
oobabooga fd7c3c5bb0 Don't git pull on installation (to make past releases installable) 2024-06-15 06:38:05 -07:00
oobabooga b6eaf7923e Bump llama-cpp-python to 0.2.78 2024-06-14 21:22:09 -07:00
oobabooga 9420973b62
Downgrade PyTorch to 2.2.2 (#6124) 2024-06-14 16:42:03 -03:00
Forkoz 1576227f16
Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-06-14 13:51:01 -03:00
dependabot[bot] fdd8fab9cf
Bump hqq from 0.1.7.post2 to 0.1.7.post3 (#6090) 2024-06-14 13:46:35 -03:00
oobabooga 10601850d9 Fix after previous commit 2024-06-13 19:54:12 -07:00
oobabooga 0f3a423de1 Alternative solution to "get next logits" deadlock (#6106) 2024-06-13 19:34:16 -07:00
oobabooga 9aef01551d Revert "Use reentrant generation lock (#6107)"
This reverts commit b675151f25.
2024-06-13 17:53:07 -07:00
oobabooga 8930bfc5f4
Bump PyTorch, ExLlamaV2, flash-attention (#6122) 2024-06-13 20:38:31 -03:00
oobabooga 386500aa37 Avoid unnecessary calls UI -> backend, to make it faster 2024-06-12 20:52:42 -07:00
oobabooga 4820ae9aef
Merge pull request #6118 from oobabooga/dev
Merge dev branch
2024-06-13 00:38:03 -03:00
Forkoz 1d79aa67cf
Fix flash-attn UI parameter to actually store true. (#6076) 2024-06-13 00:34:54 -03:00
Belladore 3abafee696
DRY sampler improvements (#6053) 2024-06-12 23:39:11 -03:00
theo77186 b675151f25
Use reentrant generation lock (#6107) 2024-06-12 23:25:05 -03:00
oobabooga a36fa73071 Lint 2024-06-12 19:00:21 -07:00
oobabooga 2d196ed2fe Remove obsolete pre_layer parameter 2024-06-12 18:56:44 -07:00
Belladore 46174a2d33
Fix error when bos_token_id is None. (#6061) 2024-06-12 22:52:27 -03:00
Belladore a363cdfca1
Fix missing bos token for some models (including Llama-3) (#6050) 2024-05-27 09:21:30 -03:00
oobabooga 8df68b05e9 Remove MinPLogitsWarper (it's now a transformers built-in) 2024-05-27 05:03:30 -07:00
oobabooga 4f1e96b9e3 Downloader: Add --model-dir argument, respect --model-dir in the UI 2024-05-23 20:42:46 -07:00
oobabooga ad54d524f7 Revert "Fix stopping strings for llama-3 and phi (#6043)"
This reverts commit 5499bc9bc8.
2024-05-22 17:18:08 -07:00
oobabooga 5499bc9bc8
Fix stopping strings for llama-3 and phi (#6043) 2024-05-22 13:53:59 -03:00
rohitanshu 8aaa0a6f4e
Fixed minor typo in docs - Training Tab.md (#6038) 2024-05-21 14:52:22 -03:00
oobabooga 9e189947d1 Minor fix after bd7cc4234d (thanks @belladoreai) 2024-05-21 10:37:30 -07:00
oobabooga ae86292159 Fix getting Phi-3-small-128k-instruct logits 2024-05-21 10:35:00 -07:00
oobabooga bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
oobabooga 6a1682aa95 README: update command-line flags with raw --help output
This helps me keep this up-to-date more easily.
2024-05-19 20:28:46 -07:00
Philipp Emanuel Weidmann 852c943769
DRY: A modern repetition penalty that reliably prevents looping (#5677) 2024-05-19 23:53:47 -03:00
oobabooga 9f77ed1b98
--idle-timeout flag to unload the model if unused for N minutes (#6026) 2024-05-19 23:29:39 -03:00
altoiddealer 818b4e0354
Let grammar escape backslashes (#5865) 2024-05-19 20:26:09 -03:00
Tisjwlf 907702c204
Fix gguf multipart file loading (#5857) 2024-05-19 20:22:09 -03:00
Guanghua Lu d7bd3da35e
Add Llama 3 instruction template (#5891) 2024-05-19 20:17:26 -03:00
A0nameless0man 5cb59707f3
fix: grammar not support utf-8 (#5900) 2024-05-19 20:10:39 -03:00
Jari Van Melckebeke 8456d13349
[docs] small docker changes (#5917) 2024-05-19 20:09:37 -03:00
Samuel Wein b63dc4e325
UI: Warn user if they are trying to load a model from no path (#6006) 2024-05-19 20:05:17 -03:00
dependabot[bot] 2de586f586
Update accelerate requirement from ==0.27.* to ==0.30.* (#5989) 2024-05-19 20:03:18 -03:00
chr 6b546a2c8b
llama.cpp: increase the max threads from 32 to 256 (#5889) 2024-05-19 20:02:19 -03:00
oobabooga abe5ddc883
Merge pull request #6027 from oobabooga/dev
Merge dev branch
2024-05-19 19:01:11 -03:00
oobabooga a38a37b3b3 llama.cpp: default n_gpu_layers to the maximum value for the model automatically 2024-05-19 10:57:42 -07:00
oobabooga a4611232b7 Make --verbose output less spammy 2024-05-18 09:57:00 -07:00
oobabooga 0d90b3a25c Bump llama-cpp-python to 0.2.75 2024-05-18 05:26:26 -07:00
oobabooga e225b0b995 downloader: fix downloading 01-ai/Yi-1.5-34B-Chat 2024-05-12 10:43:50 -07:00
oobabooga 9557f49f2f Bump llama-cpp-python to 0.2.73 2024-05-11 10:53:19 -07:00
oobabooga 9ac528715c
Merge pull request #5996 from oobabooga/dev
Merge dev branch
2024-05-08 16:37:26 -03:00
oobabooga 7a728a38eb Update README 2024-05-07 02:59:36 -07:00
oobabooga d5bde7babc UI: improve the performance of code syntax highlighting 2024-05-06 17:45:03 -07:00
oobabooga 0b193b8553 Downloader: handle one more retry case after 5770e06c48 2024-05-04 19:25:22 -07:00
oobabooga cb31998605 Add a template for NVIDIA ChatQA models 2024-05-03 08:19:04 -07:00
oobabooga e9c9483171 Improve the logging messages while loading models 2024-05-03 08:10:44 -07:00
oobabooga e61055253c Bump llama-cpp-python to 0.2.69, add --flash-attn option 2024-05-03 04:31:22 -07:00
oobabooga 0476f9fe70 Bump ExLlamaV2 to 0.0.20 2024-05-01 16:20:50 -07:00
oobabooga ae0f28530c Bump llama-cpp-python to 0.2.68 2024-05-01 08:40:50 -07:00
oobabooga 8f12fb028d
Merge pull request #5970 from oobabooga/dev
Merge dev branch
2024-05-01 09:56:23 -03:00
oobabooga 1eba888af6 Update FUNDING.yml 2024-05-01 05:54:21 -07:00
oobabooga 51fb766bea
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964) 2024-04-30 09:11:31 -03:00
oobabooga 81f603d09f
Merge pull request #5959 from oobabooga/dev
Merge dev branch
2024-04-29 15:45:48 -03:00
oobabooga 5770e06c48
Add a retry mechanism to the model downloader (#5943) 2024-04-27 12:25:28 -03:00
oobabooga dfdb6fee22 Set llm_int8_enable_fp32_cpu_offload=True for --load-in-4bit
To allow for 32-bit CPU offloading (it's very slow).
2024-04-26 09:39:27 -07:00
oobabooga 70845c76fb
Add back the max_updates_second parameter (#5937) 2024-04-26 10:14:51 -03:00
oobabooga 6761b5e7c6
Improved instruct style (with syntax highlighting & LaTeX rendering) (#5936) 2024-04-26 10:13:11 -03:00
oobabooga 9c04365f54 Detect the airoboros-3_1-yi-34b-200k template 2024-04-25 16:50:54 -07:00
oobabooga 8b1dee3ec8 Detect platypus-yi-34b, CausalLM-RP-34B, 34b-beta instruction templates 2024-04-24 21:47:43 -07:00
oobabooga 4aa481282b Detect the xwin-lm-70b-v0.1 instruction template 2024-04-24 17:02:20 -07:00
oobabooga ad122361ea
Merge pull request #5927 from oobabooga/dev
Merge dev branch
2024-04-24 13:58:53 -03:00
oobabooga c9b0df16ee Lint 2024-04-24 09:55:00 -07:00
oobabooga 4094813f8d Lint 2024-04-24 09:53:41 -07:00
oobabooga 64e2a9a0a7 Fix the Phi-3 template when used in the UI 2024-04-24 01:34:11 -07:00
oobabooga f0538efb99 Remove obsolete --tensorcores references 2024-04-24 00:31:28 -07:00
Colin f3c9103e04
Revert walrus operator for params['max_memory'] (#5878) 2024-04-24 01:09:14 -03:00
Jari Van Melckebeke c725d97368
nvidia docker: make sure gradio listens on 0.0.0.0 (#5918) 2024-04-23 23:17:55 -03:00
oobabooga 9b623b8a78
Bump llama-cpp-python to 0.2.64, use official wheels (#5921) 2024-04-23 23:17:05 -03:00
Ashley Kleynhans 0877741b03
Bumped ExLlamaV2 to version 0.0.19 to resolve #5851 (#5880) 2024-04-19 19:04:40 -03:00
oobabooga a4b732c30b
Merge pull request #5887 from oobabooga/dev
Merge dev branch
2024-04-19 12:34:50 -03:00
oobabooga f27e1ba302
Add a /v1/internal/chat-prompt endpoint (#5879) 2024-04-19 00:24:46 -03:00
oobabooga b30bce3b2f Bump transformers to 4.40 2024-04-18 16:19:31 -07:00
Philipp Emanuel Weidmann a0c69749e6
Revert sse-starlette version bump because it breaks API request cancellation (#5873) 2024-04-18 15:05:00 -03:00
mamei16 8985a8538b
Fix whisper STT (#5856) 2024-04-14 10:55:58 -03:00
oobabooga 26d822f64f
Merge pull request #5848 from oobabooga/dev
Merge dev branch
2024-04-12 12:46:25 -03:00
dependabot[bot] 597556cb77
Bump sse-starlette from 1.6.5 to 2.1.0 (#5831) 2024-04-11 18:54:05 -03:00
oobabooga e158299fb4 Fix loading sharted GGUF models through llamacpp_HF 2024-04-11 14:50:05 -07:00
wangshuai09 fd4e46bce2
Add Ascend NPU support (basic) (#5541) 2024-04-11 18:42:20 -03:00
zaypen a90509d82e
Model downloader: Take HF_ENDPOINT in consideration (#5571) 2024-04-11 18:28:10 -03:00
Ashley Kleynhans 70c637bf90
Fix saving of UI defaults to settings.yaml - Fixes #5592 (#5794) 2024-04-11 18:19:16 -03:00
oobabooga 3e3a7c4250 Bump llama-cpp-python to 0.2.61 & fix the crash 2024-04-11 14:15:34 -07:00
oobabooga 5f5ceaf025 Revert "Bump llama-cpp-python to 0.2.61"
This reverts commit 3ae61c0338.
2024-04-11 13:24:57 -07:00
dependabot[bot] bd71a504b8
Update gradio requirement from ==4.25.* to ==4.26.* (#5832) 2024-04-11 02:24:53 -03:00
Victorivus c423d51a83
Fix issue #5783 for character images with transparency (#5827) 2024-04-11 02:23:43 -03:00
Alex O'Connell b94cd6754e
UI: Respect model and lora directory settings when downloading files (#5842) 2024-04-11 01:55:02 -03:00
oobabooga 17c4319e2d Fix loading command-r context length metadata 2024-04-10 21:39:59 -07:00
oobabooga 3ae61c0338 Bump llama-cpp-python to 0.2.61 2024-04-10 21:39:46 -07:00
oobabooga cbd65ba767
Add a simple min_p preset, make it the default (#5836) 2024-04-09 12:50:16 -03:00
oobabooga ed4001e324 Bump ExLlamaV2 to 0.0.18 2024-04-08 18:05:16 -07:00
oobabooga 91a7370a65
Merge pull request #5823 from oobabooga/dev
Merge dev branch
2024-04-07 11:01:08 -03:00
oobabooga f6828de3f2 Downgrade llama-cpp-python to 0.2.56 2024-04-07 07:00:12 -07:00
Jared Van Bortel 39ff9c9dcf
requirements: add psutil (#5819) 2024-04-06 23:02:20 -03:00
oobabooga 65099dc192
Merge pull request #5822 from oobabooga/dev
Merge dev branch
2024-04-06 22:58:06 -03:00
oobabooga d02744282b Minor logging change 2024-04-06 18:56:58 -07:00
oobabooga dfb01f9a63 Bump llama-cpp-python to 0.2.60 2024-04-06 18:32:36 -07:00
oobabooga 096f75a432 Documentation: remove obsolete RWKV docs 2024-04-06 14:06:39 -07:00
oobabooga dd6e4ac55f Prevent double <BOS_TOKEN> with Command R+ 2024-04-06 13:14:32 -07:00
oobabooga 1bdceea2d4 UI: Focus on the chat input after starting a new chat 2024-04-06 12:57:57 -07:00
oobabooga 168a0f4f67 UI: do not load the "gallery" extension by default 2024-04-06 12:43:21 -07:00
oobabooga 64a76856bd Metadata: Fix loading Command R+ template with multiple options 2024-04-06 07:32:17 -07:00
oobabooga 1b87844928 Minor fix 2024-04-05 18:43:43 -07:00
oobabooga 6b7f7555fc Logging message to make transformers loader a bit more transparent 2024-04-05 18:40:02 -07:00
oobabooga 4e739dc211 Add an instruction template for Command R 2024-04-05 18:22:25 -07:00
oobabooga 8a8dbf2f16 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-04-05 12:42:23 -07:00
oobabooga 0f536dd97d UI: Fix the "Show controls" action 2024-04-05 12:18:33 -07:00
dependabot[bot] a4c67e1974
Bump aqlm[cpu,gpu] from 1.1.2 to 1.1.3 (#5790) 2024-04-05 13:26:49 -03:00
oobabooga 14f6194211 Bump Gradio to 4.25 2024-04-05 09:22:44 -07:00
oobabooga 5b91dbb73b
Merge pull request #5810 from oobabooga/dev
Merge dev branch
2024-04-05 10:55:16 -03:00
oobabooga 308452b783 Bitsandbytes: load preconverted 4bit models without additional flags 2024-04-04 18:10:24 -07:00
oobabooga d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga 13fe38eb27 Remove specialized code for gpt-4chan 2024-04-04 16:11:47 -07:00
oobabooga 3952560da8 Bump llama-cpp-python to 0.2.59 2024-04-04 11:20:48 -07:00
oobabooga 9ab7365b56 Read rope_theta for DBRX model (thanks turboderp) 2024-04-01 20:25:31 -07:00
oobabooga db5f6cd1d8 Fix ExLlamaV2 loaders using unnecessary "bits" metadata 2024-03-30 21:51:39 -07:00
oobabooga 624faa1438 Fix ExLlamaV2 context length setting (closes #5750) 2024-03-30 21:33:16 -07:00
oobabooga 70c58b5fc2 Bump ExLlamaV2 to 0.0.17 2024-03-30 21:08:26 -07:00
oobabooga 1a7c027386
Merge pull request #5772 from oobabooga/dev
Merge dev branch
2024-03-29 15:09:53 -03:00
oobabooga c37f792afa Better way to handle user_bio default in the API (alternative to bdcf31035f) 2024-03-29 10:54:01 -07:00
oobabooga 9653a9176c Minor improvements to Parameters tab 2024-03-29 10:41:24 -07:00
oobabooga 3ce0d9221b Bump transformers to 4.39 2024-03-28 19:40:31 -07:00
oobabooga e0e28ecb0b Set the gradio 4 allowed_paths 2024-03-28 15:10:54 -07:00
oobabooga 723f912c16 Fix the "typing dots" position in latest Gradio version 2024-03-28 12:57:35 -07:00
oobabooga 35da6b989d
Organize the parameters tab (#5767) 2024-03-28 16:45:03 -03:00
dependabot[bot] 3609ea69e4
Bump aqlm[cpu,gpu] from 1.1.0 to 1.1.2 (#5728) 2024-03-26 16:36:16 -03:00
Bartowski 9ad116a6e2
Add config for hyperion and hercules models to use chatml (#5742) 2024-03-26 16:35:29 -03:00
wldhx 7cbafc0540
docker: Remove obsolete CLI_ARGS variable (#5726) 2024-03-26 16:34:53 -03:00
Yiximail bdcf31035f
Set a default empty string for user_bio to fix #5717 issue (#5722) 2024-03-26 16:34:03 -03:00
Yiximail 8c9aca239a
Fix prompt incorrectly set to empty when suffix is empty string (#5757) 2024-03-26 16:33:09 -03:00
oobabooga 2a92a842ce
Bump gradio to 4.23 (#5758) 2024-03-26 16:32:20 -03:00
oobabooga 7cf1402bde
Merge pull request #5716 from oobabooga/dev
Merge dev branch
2024-03-17 12:34:53 -03:00
oobabooga 49b111e2dd Lint 2024-03-17 08:33:23 -07:00
oobabooga d890c99b53 Fix StreamingLLM when content is removed from the beginning of the prompt 2024-03-14 09:18:54 -07:00
oobabooga d828844a6f Small fix: don't save truncation_length to settings.yaml
It should derive from model metadata or from a command-line flag.
2024-03-14 08:56:28 -07:00
oobabooga 2ef5490a36 UI: make light theme less blinding 2024-03-13 08:23:16 -07:00
oobabooga 40a60e0297 Convert attention_sink_size to int (closes #5696) 2024-03-13 08:15:49 -07:00
oobabooga edec3bf3b0 UI: avoid caching convert_to_markdown calls during streaming 2024-03-13 08:14:34 -07:00
oobabooga 8152152dd6 Small fix after 28076928ac 2024-03-11 19:56:35 -07:00
oobabooga 28076928ac
UI: Add a new "User description" field for user personality/biography (#5691) 2024-03-11 23:41:57 -03:00
oobabooga 63701f59cf UI: mention that n_gpu_layers > 0 is necessary for the GPU to be used 2024-03-11 18:54:15 -07:00
oobabooga 46031407b5 Increase the cache size of convert_to_markdown to 4096 2024-03-11 18:43:04 -07:00
oobabooga 9eca197409 Minor logging change 2024-03-11 16:31:13 -07:00
oobabooga afadc787d7 Optimize the UI by caching convert_to_markdown calls 2024-03-10 20:10:07 -07:00
oobabooga 1934cb61ef
Merge pull request #5680 from oobabooga/dev
Merge dev branch
2024-03-10 23:39:20 -03:00
oobabooga 056717923f Document StreamingLLM 2024-03-10 19:15:23 -07:00
oobabooga 15d90d9bd5 Minor logging change 2024-03-10 18:20:50 -07:00
oobabooga abcdd0ad5b API: don't use settings.yaml for default values 2024-03-10 16:15:52 -07:00
oobabooga a102c704f5 Add numba to requirements.txt 2024-03-10 16:13:29 -07:00
oobabooga b3ade5832b Keep AQLM only for Linux (fails to install on Windows) 2024-03-10 09:41:17 -07:00
oobabooga 67b24b0b88 Bump llama-cpp-python to 0.2.56 2024-03-10 09:07:27 -07:00
oobabooga 763f9beb7e Bump bitsandbytes to 0.43, add official Windows wheel 2024-03-10 08:30:53 -07:00
oobabooga 52a34921ef Installer: validate the checksum for the miniconda installer on Windows 2024-03-09 16:33:12 -08:00
oobabooga cf0697936a Optimize StreamingLLM by over 10x 2024-03-08 21:48:28 -08:00
oobabooga afb51bd5d6
Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) (#5669) 2024-03-09 00:25:33 -03:00
oobabooga 9271e80914 Add back AutoAWQ for Windows
https://github.com/casper-hansen/AutoAWQ/issues/377#issuecomment-1986440695
2024-03-08 14:54:56 -08:00
oobabooga 549bb88975 Increase height of "Custom stopping strings" UI field 2024-03-08 12:54:30 -08:00
oobabooga 238f69accc Move "Command for chat-instruct mode" to the main chat tab (closes #5634) 2024-03-08 12:52:52 -08:00
oobabooga d0663bae31
Bump AutoAWQ to 0.2.3 (Linux only) (#5658) 2024-03-08 17:36:28 -03:00
oobabooga 0e6eb7c27a
Add AQLM support (transformers loader) (#5466) 2024-03-08 17:30:36 -03:00
oobabooga 2681f6f640
Make superbooga & superboogav2 functional again (#5656) 2024-03-07 15:03:18 -03:00
oobabooga bae14c8f13 Right-truncate long chat completion prompts instead of left-truncating
Instructions are usually at the beginning of the prompt.
2024-03-07 08:50:24 -08:00
oobabooga aa0da07af0
Merge pull request #5655 from oobabooga/dev
Merge dev branch
2024-03-07 13:13:10 -03:00
Bartowski 104573f7d4
Update cache_4bit documentation (#5649)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-03-07 13:08:21 -03:00
oobabooga bef08129bc Small fix for cuda 11.8 in the one-click installer 2024-03-06 21:43:36 -08:00
oobabooga 303433001f Fix a check in the installer 2024-03-06 21:13:54 -08:00
oobabooga bde7f00cae Change the exllamav2 version number 2024-03-06 21:08:29 -08:00
oobabooga 2ec1d96c91
Add cache_4bit option for ExLlamaV2 (#5645) 2024-03-06 23:02:25 -03:00
oobabooga fa0e68cefd Installer: add back INSTALL_EXTENSIONS environment variable (for docker) 2024-03-06 11:31:06 -08:00
oobabooga 992affefef
Merge pull request #5641 from oobabooga/dev
Merge dev branch
2024-03-06 12:40:10 -03:00
oobabooga fcc92caa30 Installer: add option to install requirements for just one extension 2024-03-06 07:36:23 -08:00
oobabooga 2174958362
Revert gradio to 3.50.2 (#5640) 2024-03-06 11:52:46 -03:00
oobabooga 7eee9e9470 Add -k to curl command to download miniconda on windows (closes #5628) 2024-03-06 06:46:50 -08:00
oobabooga 03f03af535 Revert "Update peft requirement from ==0.8.* to ==0.9.* (#5626)"
This reverts commit 72a498ddd4.
2024-03-05 02:56:37 -08:00
oobabooga d61e31e182
Save the extensions after Gradio 4 (#5632) 2024-03-05 07:54:34 -03:00
oobabooga ae12d045ea Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-03-05 02:35:04 -08:00
dependabot[bot] 72a498ddd4
Update peft requirement from ==0.8.* to ==0.9.* (#5626) 2024-03-05 07:34:32 -03:00
oobabooga 1437f757a1 Bump HQQ to 0.1.5 2024-03-05 02:33:51 -08:00
oobabooga 63a1d4afc8
Bump gradio to 4.19 (#5522) 2024-03-05 07:32:28 -03:00
oobabooga 164ff2440d Use the correct PyTorch in the Colab notebook 2024-03-05 01:05:19 -08:00
oobabooga 3cfcab63a5 Update an installation message 2024-03-04 20:37:44 -08:00
oobabooga 907bda0d56 Move update_wizard_wsl.sh to update_wizard_wsl.bat 2024-03-04 19:57:49 -08:00
oobabooga f697cb4609 Move update_wizard_windows.sh to update_wizard_windows.bat (oops) 2024-03-04 19:26:24 -08:00
oobabooga 2d74660733 Don't git pull on "Install/update extensions requirements" 2024-03-04 12:37:10 -08:00
oobabooga fbe83854ca Minor message change 2024-03-04 11:10:37 -08:00
oobabooga 90ab022856 Minor message change 2024-03-04 10:54:16 -08:00
oobabooga 97dc3602fc
Create an update wizard (#5623) 2024-03-04 15:52:24 -03:00
oobabooga 6adf222599 One-click installer: change an info message 2024-03-04 08:20:04 -08:00
oobabooga 4bb79c57ac One-click installer: change an info message 2024-03-04 08:11:55 -08:00
oobabooga 74564fe8d0 One-click installer: delete the Miniconda installer after completion 2024-03-04 08:11:03 -08:00
oobabooga dc2dd5b9d8 One-click installer: add an info message before git pull 2024-03-04 08:00:39 -08:00
oobabooga 527ba98105
Do not install extensions requirements by default (#5621) 2024-03-04 04:46:39 -03:00
oobabooga fa4ce0eee8 One-click installer: minor change to CMD_FLAGS.txt in CPU mode 2024-03-03 17:42:59 -08:00
oobabooga 8bd4960d05
Update PyTorch to 2.2 (also update flash-attn to 2.5.6) (#5618) 2024-03-03 19:40:32 -03:00
oobabooga 70047a5c57 Bump bitsandytes to 0.42.0 on Windows 2024-03-03 13:19:27 -08:00
oobabooga 24e86bb21b Bump llama-cpp-python to 0.2.55 2024-03-03 12:14:48 -08:00
oobabooga 60f3d87309
Merge pull request #5617 from oobabooga/dev
Merge dev branch
2024-03-03 15:50:40 -03:00
oobabooga 314e42fd98 Fix transformers requirement 2024-03-03 10:49:28 -08:00
oobabooga 71b1617c1b Remove bitsandbytes from incompatible requirements.txt files 2024-03-03 08:24:54 -08:00
kalomaze cfb25c9b3f
Cubic sampling w/ curve param (#5551)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-03-03 13:22:21 -03:00
jeffbiocode 3168644152
Training: Update llama2-chat-format.json (#5593) 2024-03-03 12:42:14 -03:00
oobabooga 71dc5b4dee Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-02-28 19:59:20 -08:00
oobabooga 09b13acfb2 Perplexity evaluation: print to terminal after calculation is finished 2024-02-28 19:58:21 -08:00
dependabot[bot] dfdf6eb5b4
Bump hqq from 0.1.3 to 0.1.3.post1 (#5582) 2024-02-26 20:51:39 -03:00
oobabooga 332957ffec Bump llama-cpp-python to 0.2.52 2024-02-26 15:05:53 -08:00
oobabooga b64770805b Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-02-26 08:51:31 -08:00
oobabooga 830168d3d4 Revert "Replace hashlib.sha256 with hashlib.file_digest so we don't need to load entire files into ram before hashing them. (#4383)"
This reverts commit 0ced78fdfa.
2024-02-26 05:54:33 -08:00
Bartowski 21acf504ce
Bump transformers to 4.38 for gemma compatibility (#5575) 2024-02-25 20:15:13 -03:00
oobabooga 4164e29416 Block the "To create a public link, set share=True" gradio message 2024-02-25 15:06:08 -08:00
oobabooga ba852716fd
Merge pull request #5574 from oobabooga/dev
Merge dev branch
2024-02-25 14:29:35 -03:00
oobabooga d34126255d Fix loading extensions with "-" in the name (closes #5557) 2024-02-25 09:24:52 -08:00
Lounger 0f68c6fb5b
Big picture fixes (#5565) 2024-02-25 14:10:16 -03:00
jeffbiocode 45c4cd01c5
Add llama 2 chat format for lora training (#5553) 2024-02-25 02:36:36 -03:00
Devin Roark e0fc808980
fix: ngrok logging does not use the shared logger module (#5570) 2024-02-25 02:35:59 -03:00
oobabooga 32ee5504ed
Remove -k from curl command to download miniconda (#5535) 2024-02-25 02:35:23 -03:00
oobabooga c07dc56736 Bump llama-cpp-python to 0.2.50 2024-02-24 21:34:11 -08:00
oobabooga 98580cad8e Bump exllamav2 to 0.0.14 2024-02-24 18:35:42 -08:00
oobabooga 527f2652af Bump llama-cpp-python to 0.2.47 2024-02-22 19:48:49 -08:00
oobabooga 3f42e3292a Revert "Bump autoawq from 0.1.8 to 0.2.2 (#5547)"
This reverts commit d04fef6a07.
2024-02-22 19:48:04 -08:00
oobabooga 10aedc329f Logging: more readable messages when renaming chat histories 2024-02-22 07:57:06 -08:00
oobabooga faf3bf2503 Perplexity evaluation: make UI events more robust (attempt) 2024-02-22 07:13:22 -08:00
oobabooga ac5a7a26ea Perplexity evaluation: add some informative error messages 2024-02-21 20:20:52 -08:00
oobabooga 59032140b5 Fix CFG with llamacpp_HF (2nd attempt) 2024-02-19 18:35:42 -08:00
oobabooga c203c57c18 Fix CFG with llamacpp_HF 2024-02-19 18:09:49 -08:00
dependabot[bot] 5f7dbf454a
Update optimum requirement from ==1.16.* to ==1.17.* (#5548) 2024-02-19 19:15:21 -03:00
dependabot[bot] d04fef6a07
Bump autoawq from 0.1.8 to 0.2.2 (#5547) 2024-02-19 19:14:55 -03:00
dependabot[bot] ed6ff49431
Update accelerate requirement from ==0.25.* to ==0.27.* (#5546) 2024-02-19 19:14:04 -03:00
oobabooga d6bb6e7390
Merge pull request #5549 from oobabooga/dev
Merge dev branch
2024-02-19 18:53:25 -03:00
Kevin Pham 10df23efb7
Remove message.content from openai streaming API (#5503) 2024-02-19 18:50:27 -03:00
oobabooga 0b2279d031 Bump llama-cpp-python to 0.2.44 2024-02-19 13:42:31 -08:00
oobabooga ae05d9830f Replace {{char}}, {{user}} in the chat template itself 2024-02-18 19:57:54 -08:00
oobabooga 717c3494e8 Minor width change after daa140447e 2024-02-18 15:23:45 -08:00
oobabooga 1f27bef71b
Move chat UI elements to the right on desktop (#5538) 2024-02-18 14:32:05 -03:00
oobabooga d8064c00e8 UI: hide chat scrollbar on desktop when not hovered 2024-02-17 20:47:14 -08:00
oobabooga 36c29084bb UI: fix instruct style background for multiline inputs 2024-02-17 20:09:47 -08:00
oobabooga 904867a139 UI: fix scroll down after sending a multiline message 2024-02-17 19:27:13 -08:00
oobabooga 7838075990
Merge pull request #5534 from oobabooga/dev
Merge dev branch
2024-02-17 18:09:40 -03:00
oobabooga d6bd71db7f ExLlamaV2: fix loading when autosplit is not set 2024-02-17 12:54:37 -08:00
oobabooga dd46229487
Merge pull request #5530 from oobabooga/dev
Merge dev branch
2024-02-17 14:02:39 -03:00
oobabooga af0bbf5b13 Lint 2024-02-17 09:01:04 -08:00
fschuh fa1019e8fe
Removed extra spaces from Mistral instruction template that were causing Mistral to misbehave (#5517) 2024-02-16 21:40:51 -03:00
oobabooga c375c753d6 Bump bitsandbytes to 0.42 (Linux only) 2024-02-16 10:47:57 -08:00
oobabooga a6730f88f7
Add --autosplit flag for ExLlamaV2 (#5524) 2024-02-16 15:26:10 -03:00
oobabooga 4039999be5 Autodetect llamacpp_HF loader when tokenizer exists 2024-02-16 09:29:26 -08:00
oobabooga 76d28eaa9e
Add a menu for customizing the instruction template for the model (#5521) 2024-02-16 14:21:17 -03:00
oobabooga 0e1d8d5601 Instruction template: make "Send to default/notebook" work without a tokenizer 2024-02-16 08:01:07 -08:00
oobabooga f465b7b486
Downloader: start one session per file (#5520) 2024-02-16 12:55:27 -03:00
oobabooga 44018c2f69
Add a "llamacpp_HF creator" menu (#5519) 2024-02-16 12:43:24 -03:00
oobabooga b2b74c83a6 Fix Qwen1.5 in llamacpp_HF 2024-02-15 19:04:19 -08:00
oobabooga 080f7132c0
Revert gradio to 3.50.2 (#5513) 2024-02-15 20:40:23 -03:00
oobabooga ea0e1feee7 Bump llama-cpp-python to 0.2.43 2024-02-14 21:58:24 -08:00
oobabooga 549f106879 Bump ExLlamaV2 to v0.0.13.2 2024-02-14 21:57:48 -08:00
oobabooga 7123ac3f77
Remove "Maximum UI updates/second" parameter (#5507) 2024-02-14 23:34:30 -03:00
DominikKowalczyk 33c4ce0720
Bump gradio to 4.19 (#5419)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-02-14 23:28:26 -03:00
oobabooga 771c59290a
Merge pull request #5502 from oobabooga/dev
Merge dev branch
2024-02-14 11:32:58 -03:00
oobabooga 04d8bdf929 Fix ExLlamaV2 requirement on Windows 2024-02-14 06:31:20 -08:00
oobabooga b16958575f Minor bug fix 2024-02-13 19:48:32 -08:00
oobabooga d47182d9d1
llamacpp_HF: do not use oobabooga/llama-tokenizer (#5499) 2024-02-14 00:28:51 -03:00
oobabooga 3a9ce3cfa6 Update stalebot message 2024-02-13 19:06:32 -08:00
oobabooga 93dd31fc0f Increase stalebot timeout 2024-02-13 16:07:33 -08:00
oobabooga dc6adefd87
Merge pull request #5496 from oobabooga/dev
Merge dev branch
2024-02-13 21:06:16 -03:00
oobabooga 069ed7c6ef Lint 2024-02-13 16:05:41 -08:00
oobabooga 193548edce Minor fix to ExLlamaV2 requirements 2024-02-13 16:00:06 -08:00
oobabooga 25b655faeb Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-02-13 15:49:53 -08:00
oobabooga f99f1fc68e Bump llama-cpp-python to 0.2.42 2024-02-13 15:49:20 -08:00
dependabot[bot] d8081e85ec
Update peft requirement from ==0.7.* to ==0.8.* (#5446) 2024-02-13 16:27:18 -03:00
dependabot[bot] 653b195b1e
Update numpy requirement from ==1.24.* to ==1.26.* (#5490) 2024-02-13 16:26:35 -03:00
dependabot[bot] 147b4cf3e0
Bump hqq from 0.1.2.post1 to 0.1.3 (#5489) 2024-02-13 16:25:02 -03:00
Steven K 512933fa44
Update main.css to allow scrolling in code blocks (#5495) 2024-02-13 16:24:30 -03:00
oobabooga e9fea353c5 Bump llama-cpp-python to 0.2.40 2024-02-13 11:22:34 -08:00
oobabooga 7342afaf19 Update the PyTorch installation instructions 2024-02-08 20:36:11 -08:00
oobabooga 86c320ab5a llama.cpp: add a progress bar for prompt evaluation 2024-02-07 21:56:10 -08:00
oobabooga acea6a6669 Add more exllamav2 wheels 2024-02-07 08:24:29 -08:00
oobabooga 35537ad3d1
Bump exllamav2 to 0.0.13.1 (#5463) 2024-02-07 13:17:04 -03:00
oobabooga b8e25e8678 Bump llama-cpp-python to 0.2.39 2024-02-07 06:50:47 -08:00
oobabooga c55b8ce932 Improved random preset generation 2024-02-06 08:51:52 -08:00
oobabooga 4e34ae0587 Minor logging improvements 2024-02-06 08:22:08 -08:00
oobabooga 3add2376cd Better warpers logging 2024-02-06 07:09:21 -08:00
oobabooga 494cc3c5b0 Handle empty sampler priority field, use default values 2024-02-06 07:05:32 -08:00
oobabooga 0f134bf744
Merge pull request #5453 from oobabooga/dev
Merge dev branch
2024-02-06 11:50:21 -03:00
oobabooga 775902c1f2 Sampler priority: better logging, always save to presets 2024-02-06 06:49:22 -08:00
oobabooga a329db062e
Merge pull request #5452 from oobabooga/dev
Merge dev branch
2024-02-06 11:36:00 -03:00
oobabooga acfbe6b3b3 Minor doc changes 2024-02-06 06:35:01 -08:00
oobabooga 8ee3cea7cb Improve some log messages 2024-02-06 06:31:27 -08:00
oobabooga 8a6d9abb41 Small fixes 2024-02-06 06:26:27 -08:00
oobabooga 2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga 8c35fefb3b
Add custom sampler order support (#5443) 2024-02-06 11:20:10 -03:00
oobabooga 7301c7618f Minor change to Models tab 2024-02-04 21:49:58 -08:00
oobabooga f234fbe83f Improve a log message after previous commit 2024-02-04 21:44:53 -08:00
oobabooga 7073665a10
Truncate long chat completions inputs (#5439) 2024-02-05 02:31:24 -03:00
oobabooga 9033fa5eee Organize the Model tab 2024-02-04 19:30:22 -08:00
oobabooga cd4ffd3dd4 Update docs 2024-02-04 18:48:04 -08:00
oobabooga 92d0617bce Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-02-04 18:40:46 -08:00
oobabooga a210999255 Bump safetensors version 2024-02-04 18:40:25 -08:00
Badis Ghoubali 9fdee65cf5
Improve ChatML template (#5411) 2024-02-04 23:39:15 -03:00
Forkoz 2a45620c85
Split by rows instead of layers for llama.cpp multi-gpu (#5435) 2024-02-04 23:36:40 -03:00
Badis Ghoubali 3df7e151f7
fix the n_batch slider (#5436) 2024-02-04 18:15:30 -03:00
oobabooga 4e188eeb80 Lint 2024-02-03 20:40:10 -08:00
oobabooga cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
kalomaze b6077b02e4
Quadratic sampling (#5403)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-02-04 00:20:02 -03:00
oobabooga e98d1086f5
Bump llama-cpp-python to 0.2.38 (#5420) 2024-02-01 20:09:30 -03:00
oobabooga 4f3fdf1b5f
Merge pull request #5404 from oobabooga/dev
Merge dev branch
2024-01-30 14:17:08 -03:00
oobabooga 167ee72d4e Lint 2024-01-30 09:16:23 -08:00
oobabooga ee65f4f014 Downloader: don't assume that huggingface_hub is installed 2024-01-30 09:14:11 -08:00
oobabooga 89f6036e98
Bump llama-cpp-python, remove python 3.8/3.9, cuda 11.7 (#5397) 2024-01-30 13:19:20 -03:00
Forkoz 528318b700
API: Remove tiktoken from logit bias (#5391) 2024-01-28 21:42:03 -03:00
Badis Ghoubali 40c7977f9b
Add roleplay.gbnf grammar (#5368) 2024-01-28 21:41:28 -03:00
smCloudInTheSky b1463df0a1
docker: add options for CPU only, Intel GPU, AMD GPU (#5380) 2024-01-28 11:18:14 -03:00
oobabooga d921f80322 one-click: minor fix after 5e87678fea 2024-01-28 06:14:15 -08:00
Evgenii 26c3ab367e
one-click: use f-strings to improve readability and unify with the rest code (#5068) 2024-01-27 17:31:22 -03:00
Andrew C. Dvorak 5e87678fea
Support running as a git submodule. (#5227) 2024-01-27 17:18:50 -03:00
Hubert Kasperek 69622930c7
Ability to run the Coqui TTS extension on the CPU (#5365) 2024-01-27 17:15:34 -03:00
Anthony Guijarro 828be63f2c
Downloader: use HF get_token function (#5381) 2024-01-27 17:13:09 -03:00
oobabooga e7a760e6b3
Merge pull request #5379 from oobabooga/dev
Merge dev branch
2024-01-26 11:18:45 -03:00
oobabooga de387069da Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-01-26 06:12:19 -08:00
sam-ngu c0bdcee646
added trust_remote_code to deepspeed init loaderClass (#5237) 2024-01-26 11:10:57 -03:00
dependabot[bot] bfe2326a24
Bump hqq from 0.1.2 to 0.1.2.post1 (#5349) 2024-01-26 11:10:18 -03:00
oobabooga 70648e75e6 Docs: minor change 2024-01-26 06:00:26 -08:00
oobabooga c1470870bb Update README 2024-01-26 05:58:40 -08:00
oobabooga 87dc421ee8
Bump exllamav2 to 0.0.12 (#5352) 2024-01-22 22:40:12 -03:00
oobabooga 837bd888e4
Merge pull request #5348 from oobabooga/dev
Merge dev branch
2024-01-22 11:18:46 -03:00
oobabooga 1343aa3d33
Merge pull request #5347 from oobabooga/dev
Merge dev branch
2024-01-22 09:44:53 -03:00
oobabooga aa575119e6 API: minor fix 2024-01-22 04:38:43 -08:00
oobabooga 821dd65fb3 API: add a comment 2024-01-22 04:15:51 -08:00
oobabooga 6247eafcc5 API: better handle temperature = 0 2024-01-22 04:12:23 -08:00
oobabooga 817866c9cf Lint 2024-01-22 04:07:25 -08:00
oobabooga b9d1873301 Bump transformers to 4.37 2024-01-22 04:07:12 -08:00
oobabooga aad73667af Lint 2024-01-22 03:25:55 -08:00
oobabooga 6ada77cf5a Update README.md 2024-01-22 03:17:15 -08:00
oobabooga 8b5495ebf8 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-01-22 03:15:29 -08:00
oobabooga cc6505df14 Update README.md 2024-01-22 03:14:56 -08:00
Cohee fbf8ae39f8
API: Allow content arrays for multimodal OpenAI requests (#5277) 2024-01-22 08:10:26 -03:00
Ercan 166fdf09f3
API: Properly handle Images with RGBA color format (#5332) 2024-01-22 08:08:51 -03:00
lmg-anon db1da9f98d
Fix logprobs tokens in OpenAI API (#5339) 2024-01-22 08:07:42 -03:00
oobabooga b5cabb6e9d
Bump llama-cpp-python to 0.2.31 (#5345) 2024-01-22 08:05:59 -03:00
oobabooga 8962bb173e
Bump llama-cpp-python to 0.2.29 (#5307) 2024-01-18 14:24:17 -03:00
Stefan Daniel Schwarz 232c07bf1f
API: set do_sample=false when temperature=0 (#5275) 2024-01-17 23:58:11 -03:00
Yiximail 3fef37cda8
UI: Update position of show-controls label to avoid line breaks due to font size (#5256) 2024-01-17 23:56:48 -03:00
oobabooga 7916cf863b Bump transformers (necesary for e055967974) 2024-01-17 12:37:31 -08:00
Forkoz 5c5ef4cef7
UI: change n_gpu_layers maximum to 256 for larger models. (#5262) 2024-01-17 17:13:16 -03:00
ilya sheprut 4d14eb8b82
LoRA: Fix error "Attempting to unscale FP16 gradients" when training (#5268) 2024-01-17 17:11:49 -03:00
Katehuuh 535ea9928a
Fixed whisper README Typo Hyperlinks (#5281) 2024-01-17 17:10:45 -03:00
oobabooga e055967974
Add prompt_lookup_num_tokens parameter (#5296) 2024-01-17 17:09:36 -03:00
oobabooga d8c3a5bee8
Merge pull request #5266 from oobabooga/dev
Merge dev branch (#5257)
2024-01-14 13:31:40 -03:00
Samuel Weinhardt 952a05a7c8
Correct field alias types for OpenAI extension (#5257) 2024-01-14 13:30:36 -03:00
oobabooga 61e4bfe305
Merge pull request #5253 from oobabooga/dev
Merge dev branch
2024-01-13 21:49:32 -03:00
Rimmy J d80b191b1c
Add requirement jinja2==3.1.* to fix error as described in issue #5240 (#5249)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
Co-authored-by: Rim <anonymous@mail.com>
2024-01-13 21:47:13 -03:00
oobabooga e1dd5ee2de UI: focus on the chat input when switching to the chat tab 2024-01-10 05:12:49 -08:00
oobabooga ec2da5adef Docs: document keyboard shortcuts 2024-01-10 03:58:39 -08:00
oobabooga b3fc2cd887 UI: Do not save unchanged extension settings to settings.yaml 2024-01-10 03:48:30 -08:00
oobabooga bb2c4707c4 API: fix bug after previous commit 2024-01-09 19:08:02 -08:00
oobabooga 4332e24740 API: Make user_name/bot_name the official and name1/name2 the alias 2024-01-09 19:06:11 -08:00
oobabooga a4c51b5a05 API: add "user_name" and "bot_name" aliases for name1 and name2 2024-01-09 19:02:45 -08:00
oobabooga 53dc1d8197 UI: Do not save unchanged settings to settings.yaml 2024-01-09 18:59:04 -08:00
oobabooga 2dc8db8aa4
Merge pull request #5220 from oobabooga/dev
Merge dev branch
2024-01-09 21:38:35 -03:00
oobabooga 038b4fc8af Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-01-09 16:28:14 -08:00
oobabooga 89e7e107fc Lint 2024-01-09 16:27:50 -08:00
Badis Ghoubali c44836c4d7
Fix spaces in Mistral/Mixtral instruct prompt (#5214) 2024-01-09 21:12:54 -03:00
mamei16 bec4e0a1ce
Fix update event in refresh buttons (#5197) 2024-01-09 14:49:37 -03:00
oobabooga 4333d82b9d Minor bug fix 2024-01-09 06:55:18 -08:00
oobabooga fbce30b09f
Reduce the number of built-in presets (#5217) 2024-01-09 11:50:10 -03:00
oobabooga 953343cced Improve the file saving/deletion menus 2024-01-09 06:33:47 -08:00
oobabooga 123f27a3c5 Load the nearest character after deleting a character
Instead of the first.
2024-01-09 06:24:27 -08:00
oobabooga ba87b9993d Change a label in the gallery extension 2024-01-09 06:06:57 -08:00
oobabooga b908ed318d Revert "Rename past chats -> chat history"
This reverts commit aac93a1fd6.
2024-01-09 05:26:07 -08:00
oobabooga 4ca82a4df9 Save light/dark theme on "Save UI defaults to settings.yaml" 2024-01-09 04:20:10 -08:00
oobabooga 7af50ede94 Reorder some buttons 2024-01-09 04:11:50 -08:00
oobabooga a9f49a7574 Confirm the chat history rename with enter 2024-01-09 04:00:53 -08:00
oobabooga 4d730a759a Focus on the rename text area when it becomes visible 2024-01-09 04:00:47 -08:00
oobabooga 6e9d814095 Change a padding after 4f7e1eeafd 2024-01-09 03:41:31 -08:00
oobabooga 7bdd2118a2 Change some log messages when deleting files 2024-01-09 03:32:01 -08:00
oobabooga aac93a1fd6 Rename past chats -> chat history 2024-01-09 03:14:30 -08:00
oobabooga 615fa11af8 Move new chat button, improve history deletion handling 2024-01-08 21:22:37 -08:00
oobabooga 4f7e1eeafd
Past chat histories in a side bar on desktop (#5098)
Lots of room for improvement, but that's a start.
2024-01-09 01:57:29 -03:00
oobabooga 372ef5e2d8 Fix dynatemp parameters always visible 2024-01-08 19:42:31 -08:00
oobabooga 29c2693ea0
dynatemp_low, dynatemp_high, dynatemp_exponent parameters (#5209) 2024-01-08 23:28:35 -03:00
oobabooga dc1df22a2b
Press Tab to switch between current tab and Parameters tab (#5210) 2024-01-08 23:23:55 -03:00
dependabot[bot] 32cdc66cf1
Bump hqq from 0.1.1.post1 to 0.1.2 (#5204) 2024-01-08 22:51:44 -03:00
oobabooga c4e005efec Fix dropdown menus sometimes failing to refresh 2024-01-08 17:49:54 -08:00
oobabooga 9cd2106303 Revert "Add dynamic temperature to the random preset button"
This reverts commit 4365fb890f.
2024-01-08 16:46:24 -08:00
oobabooga 4365fb890f Add dynamic temperature to the random preset button 2024-01-07 13:08:15 -08:00
oobabooga ad1ff53034
Merge pull request #5199 from oobabooga/dev
Merge dev branch
2024-01-07 17:06:02 -03:00
oobabooga 0d07b3a6a1
Add dynamic_temperature_low parameter (#5198) 2024-01-07 17:03:47 -03:00
oobabooga b8a0b3f925 Don't print torch tensors with --verbose 2024-01-07 10:35:55 -08:00
oobabooga e169993b7a
Merge pull request #5195 from oobabooga/dev
Merge dev branch
2024-01-07 15:12:27 -03:00
oobabooga cf820c69c5 Print generation parameters with --verbose (HF only) 2024-01-07 10:06:23 -08:00
oobabooga c4c7fc4ab3 Lint 2024-01-07 09:36:56 -08:00
Yilong Guo d93db3b486
Refine ipex setup (#5191) 2024-01-07 10:40:30 -03:00
kalomaze 48327cc5c4
Dynamic Temperature HF loader support (#5174)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-01-07 10:36:26 -03:00
Philipp Claßen 3eca20c015
Typo fixed in variable names (#5184) 2024-01-06 03:05:03 -03:00
oobabooga 8ea3f31601
Merge pull request #5181 from oobabooga/dev
Merge dev branch
2024-01-05 18:42:30 -03:00
oobabooga 91c2b8e11c Improvements to character_bias extension 2024-01-04 20:48:26 -08:00
oobabooga 248742df1c Save extension fields to settings.yaml on "Save UI defaults" 2024-01-04 20:33:42 -08:00
oobabooga 9e86bea8e9 Use requirements_cpu.txt for intel 2024-01-04 18:52:14 -08:00
oobabooga 3d854ee516
Pin PyTorch version to 2.1 (#5056) 2024-01-04 23:50:23 -03:00
Matthew Raaff c9c31f71b8
Various one-click installer improvements (#4994)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-01-04 23:41:54 -03:00
oobabooga c9d814592e Increase maximum temperature value to 5 2024-01-04 17:28:15 -08:00
Guanghua Lu 3bb4b0504e
Close the menu on second click. (#5110) 2024-01-04 13:52:11 -03:00
oobabooga e4d724eb3f Fix cache_folder bug introduced in 37eff915d6 2024-01-04 07:49:40 -08:00
Alberto Cano 37eff915d6
Use --disk-cache-dir for all caches 2024-01-04 00:27:26 -03:00
oobabooga c54d1daaaa
Merge pull request #5163 from oobabooga/dev
Merge dev branch
2024-01-03 22:57:00 -03:00
Lounger 7965f6045e
Fix loading latest history for file names with dots (#5162) 2024-01-03 22:39:41 -03:00
Adam Florizone 894e1a0700
Docker: added build args for non AVX2 CPU (#5154) 2024-01-03 20:43:02 -03:00
AstrisCantCode b80e6365d0
Fix various bugs for LoRA training (#5161) 2024-01-03 20:42:20 -03:00
oobabooga f6a204d7c9 Bump llama-cpp-python to 0.2.26 2024-01-03 11:06:36 -08:00
oobabooga 3a6cba9021 Add top_k=1 to Debug-deterministic preset
Makes it work with llama.cpp
2024-01-02 15:54:56 -08:00
oobabooga 3f28925a8d
Merge pull request #5152 from oobabooga/dev
Merge dev branch
2024-01-02 13:22:14 -03:00
oobabooga 7cce88c403 Rmove an unncecessary exception 2024-01-02 07:20:59 -08:00
oobabooga 90c7e84b01 UI: improve chat style margin for last bot message 2024-01-01 19:50:13 -08:00
oobabooga a4b4708560 Decrease "Show controls" button opacity 2024-01-01 19:08:30 -08:00
oobabooga 94afa0f9cf Minor style changes 2024-01-01 16:00:22 -08:00
oobabooga 3e3a66e721
Merge pull request #5132 from oobabooga/dev
Merge dev branch
2023-12-31 02:32:25 -03:00
oobabooga cbf6f9e695 Update some UI messages 2023-12-30 21:31:17 -08:00
oobabooga 2aad91f3c9
Remove deprecated command-line flags (#5131) 2023-12-31 02:07:48 -03:00
TheInvisibleMage 485b85ee76
Superboogav2 Quick Fixes (#5089) 2023-12-31 02:03:23 -03:00
oobabooga 2734ce3e4c
Remove RWKV loader (#5130) 2023-12-31 02:01:40 -03:00
oobabooga 0e54a09bcb
Remove exllamav1 loaders (#5128) 2023-12-31 01:57:06 -03:00
oobabooga 8e397915c9
Remove --sdp-attention, --xformers flags (#5126) 2023-12-31 01:36:51 -03:00
B611 b7dd1f9542
Specify utf-8 encoding for model metadata file open (#5125) 2023-12-31 01:34:32 -03:00
oobabooga 20a2eaaf95 Add .vs to .gitignore 2023-12-27 12:58:07 -08:00
oobabooga a4079e879e CSS: don't change --chat-height when outside the chat tab 2023-12-27 11:51:55 -08:00
oobabooga c419206ce1 Lint the JS/CSS 2023-12-27 09:59:23 -08:00
oobabooga 3fd7073808
Merge pull request #5100 from oobabooga/dev
Merge dev branch
2023-12-27 13:23:28 -03:00
oobabooga 648c2d1cc2 Update settings-template.yaml 2023-12-25 15:25:16 -08:00
oobabooga c21e3d6300
Merge pull request #5044 from TheLounger/style_improvements
Improve chat styles
2023-12-25 20:00:50 -03:00
oobabooga 2ad6c526b8 Check if extensions block exists before changing it 2023-12-25 14:43:12 -08:00
oobabooga 63553b41ed Improve some paddings 2023-12-25 14:25:31 -08:00
oobabooga abd227594c Fix a border radius 2023-12-25 14:17:00 -08:00
oobabooga 8d0359a6d8 Rename some CSS variables 2023-12-25 14:10:07 -08:00
oobabooga 5466ae59a7 Prevent input/chat area overlap with new --my-delta variable 2023-12-25 14:07:31 -08:00
oobabooga 19d13743a6
Merge pull request #5078 from oobabooga/dev
Merge dev branch
2023-12-25 17:23:01 -03:00
oobabooga 02d063fb9f Fix extra space after 18ca35faaa 2023-12-25 08:38:17 -08:00
oobabooga ae927950a8 Remove instruct style border radius 2023-12-25 08:35:33 -08:00
oobabooga 18ca35faaa Space between chat tab and extensions block 2023-12-25 08:34:02 -08:00
oobabooga 73ba7a8921 Change height -> min-height for .chat 2023-12-25 08:32:02 -08:00
oobabooga 29b0f14d5a
Bump llama-cpp-python to 0.2.25 (#5077) 2023-12-25 12:36:32 -03:00
oobabooga af876095e2
Merge pull request #5073 from oobabooga/dev
Merge dev branch
2023-12-25 02:58:45 -03:00
oobabooga c06f630bcc Increase max_updates_second maximum value 2023-12-24 13:29:47 -08:00
Casper 92d5e64a82
Bump AutoAWQ to 0.1.8 (#5061) 2023-12-24 14:27:34 -03:00
oobabooga 4aeebfc571 Merge branch 'dev' into TheLounger-style_improvements 2023-12-24 09:24:55 -08:00
oobabooga d76b00c211 Pin lm_eval package version 2023-12-24 09:22:31 -08:00
oobabooga 8c60495878 UI: add "Maximum UI updates/second" parameter 2023-12-24 09:17:40 -08:00
zhangningboo 1b8b61b928
Fix output_ids decoding for Qwen/Qwen-7B-Chat (#5045) 2023-12-22 23:11:02 -03:00
kabachuha dbe438564e
Support for sending images into OpenAI chat API (#4827) 2023-12-22 22:45:53 -03:00
Stefan Daniel Schwarz 8956f3ebe2
Synthia instruction templates (#5041) 2023-12-22 22:19:43 -03:00
Yiximail afc91edcb2
Reset the model_name after unloading the model (#5051) 2023-12-22 22:18:24 -03:00
Lounger 554a8f910b Attempt at shrinking chat area when input box grows 2023-12-22 04:51:20 +01:00
oobabooga 4b25acf58f
Merge pull request #5039 from oobabooga/dev
Merge dev branch
2023-12-21 20:22:48 -03:00
Lounger 588b37c032 Add slight padding to top of message container 2023-12-21 22:04:41 +01:00
Lounger 568541aa31 Remove bottom padding on chat tab 2023-12-21 21:48:34 +01:00
oobabooga c1b99f45cb Make --help output instant 2023-12-21 09:32:20 -08:00
Lounger 0dd759c44f Claim more vertical space 2023-12-21 05:42:06 +01:00
Lounger 6fbd64db72 Set borders for all chat styles 2023-12-21 05:00:56 +01:00
oobabooga 2706149c65
Organize the CMD arguments by group (#5027) 2023-12-21 00:33:55 -03:00
oobabooga c727a70572 Remove redundancy from modules/loaders.py 2023-12-20 19:18:07 -08:00
Lounger e3e053ab99 UI: Expand chat vertically and handle header wrapping 2023-12-21 03:42:23 +01:00
Lounger a098c7eee3 Merge branch 'dev' into style_improvements 2023-12-20 23:09:15 +01:00
oobabooga 11288d11d4
Merge pull request #5022 from oobabooga/dev
Merge dev branch
2023-12-20 15:56:04 -03:00
luna 6efbe3009f
let exllama v1 models load safetensor loras (#4854) 2023-12-20 13:29:19 -03:00
oobabooga bcba200790 Fix EOS being ignored in ExLlamav2 after previous commit 2023-12-20 07:54:06 -08:00
oobabooga f0f6d9bdf9 Add HQQ back & update version
This reverts commit 2289e9031e.
2023-12-20 07:46:09 -08:00
oobabooga b15f510154 Optimize ExLlamav2 (non-HF) loader 2023-12-20 07:31:42 -08:00
oobabooga 489f4a23bf
Merge pull request #5012 from oobabooga/dev
Merge dev branch
2023-12-20 02:59:30 -03:00
oobabooga 258c695ead Add rich requirement 2023-12-19 21:58:36 -08:00
oobabooga c1f78dbd0f
Merge pull request #5011 from oobabooga/dev
Merge dev branch
2023-12-20 02:38:25 -03:00
oobabooga fadb295d4d Lint 2023-12-19 21:36:57 -08:00
oobabooga 2289e9031e Remove HQQ from requirements (after https://github.com/oobabooga/text-generation-webui/issues/4993) 2023-12-19 21:33:49 -08:00
oobabooga fb8ee9f7ff Add a specific error if HQQ is missing 2023-12-19 21:32:58 -08:00
oobabooga 366c93a008 Hide a warning 2023-12-19 21:03:20 -08:00
oobabooga 9992f7d8c0 Improve several log messages 2023-12-19 20:54:32 -08:00
oobabooga 23818dc098 Better logger
Credits: vladmandic/automatic
2023-12-19 20:38:33 -08:00
oobabooga 95600073bc Add an informative error when extension requirements are missing 2023-12-19 20:20:45 -08:00
Lounger f9accd38e0 UI: Update chat instruct styles 2023-12-20 02:54:08 +01:00
oobabooga d8279dc710 Replace character name placeholders in chat context (closes #5007) 2023-12-19 17:31:46 -08:00
Lounger ff3e845b04 UI: Header boy is dropping shadows 2023-12-20 01:24:34 +01:00
Lounger 40d5bf6c35 Set margin on other tabs too 2023-12-19 23:42:13 +01:00
Lounger f42074b6c1 UI: Remove header margin on chat tab 2023-12-19 23:27:11 +01:00
oobabooga 5b791cae4a
Merge pull request #5005 from oobabooga/dev
Merge dev branch
2023-12-19 18:21:09 -03:00
oobabooga e83e6cedbe Organize the model menu 2023-12-19 13:18:26 -08:00
oobabooga f4ae0075e8 Fix conversion from old template format to jinja2 2023-12-19 13:16:52 -08:00
oobabooga de138b8ba6
Add llama-cpp-python wheels with tensor cores support (#5003) 2023-12-19 17:30:53 -03:00
oobabooga 71eb744b1c
Merge pull request #5002 from oobabooga/dev
Merge dev branch
2023-12-19 15:24:40 -03:00
oobabooga 0a299d5959
Bump llama-cpp-python to 0.2.24 (#5001) 2023-12-19 15:22:21 -03:00
oobabooga 83cf1a6b67 Fix Yi space issue (closes #4996) 2023-12-19 07:54:19 -08:00
oobabooga 781367bdc3
Merge pull request #4988 from oobabooga/dev
Merge dev branch
2023-12-18 23:42:16 -03:00
oobabooga 9847809a7a Add a warning about ppl evaluation without --no_use_fast 2023-12-18 18:09:24 -08:00
oobabooga f6d701624c UI: mention that QuIP# does not work on Windows 2023-12-18 18:05:02 -08:00
oobabooga a23a004434 Update the example template 2023-12-18 17:47:35 -08:00
oobabooga 3d10c574e7 Fix custom system messages in instruction templates 2023-12-18 17:45:06 -08:00
dependabot[bot] 9e48e50428
Update optimum requirement from ==1.15.* to ==1.16.* (#4986) 2023-12-18 21:43:29 -03:00
俞航 9fa3883630
Add ROCm wheels for exllamav2 (#4973)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-12-18 21:40:38 -03:00
Water 674be9a09a
Add HQQ quant loader (#4888)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-12-18 21:23:16 -03:00
oobabooga b28020a9e4
Merge pull request #4980 from oobabooga/dev
Merge dev branch
2023-12-18 10:11:32 -03:00
oobabooga 64a57d9dc2 Remove duplicate instruction templates 2023-12-17 21:39:47 -08:00
oobabooga 1f9e25e76a UI: update "Saved instruction templates" dropdown after loading template 2023-12-17 21:19:06 -08:00
oobabooga da1c8d77ea Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-12-17 21:05:10 -08:00
oobabooga cac89df97b Instruction templates: better handle unwanted bos tokens 2023-12-17 21:04:30 -08:00
oobabooga f0d6ead877
llama.cpp: read instruction template from GGUF metadata (#4975) 2023-12-18 01:51:58 -03:00
oobabooga 3f3cd4fbe4 UI: improve list style in chat modes 2023-12-17 20:26:57 -08:00
oobabooga 306c479d3a Minor fix to Vigogne-Chat template 2023-12-17 19:15:54 -08:00
Hirose 3f973e1fbf
Add detection for Eric Hartford's Dolphin models in models/config.yaml (#4966) 2023-12-17 23:56:34 -03:00
Eve 7c6f39382b
Add Orca-Vicuna instruction template (#4971) 2023-12-17 23:55:23 -03:00
FartyPants (FP HAM) 59da429cbd
Update Training PRO (#4972)
- rolling back safetensors to bi, until it is fixed correctly
- removing the ugly checkpoint detour
2023-12-17 23:54:06 -03:00
oobabooga 7be09836fc
Merge pull request #4961 from oobabooga/dev
Merge dev branch
2023-12-17 12:11:13 -03:00
oobabooga f1f2c4c3f4
Add --num_experts_per_token parameter (ExLlamav2) (#4955) 2023-12-17 12:08:33 -03:00
oobabooga 12690d3ffc
Better HF grammar implementation (#4953) 2023-12-17 02:01:23 -03:00
oobabooga aa200f8723 UI: remove no longer necessary js in Default/Notebook tabs 2023-12-16 19:39:00 -08:00
oobabooga 7a84d7b2da
Instruct style improvements (#4951) 2023-12-16 22:16:26 -03:00
oobabooga 41424907b1 Update README 2023-12-16 16:35:36 -08:00
oobabooga d2ed0a06bf Bump ExLlamav2 to 0.0.11 (adds Mixtral support) 2023-12-16 16:34:15 -08:00
oobabooga 0087dca286 Update README 2023-12-16 12:28:51 -08:00
oobabooga f8079d067d UI: save the sent chat message on "no model is loaded" error 2023-12-16 10:52:41 -08:00
oobabooga 443be391f2
Merge pull request #4937 from oobabooga/dev
Merge dev branch
2023-12-15 12:03:22 -03:00
oobabooga a060908d6c Mixtral Instruct: detect prompt format for llama.cpp loader
Workaround until the tokenizer.chat_template kv field gets implemented
2023-12-15 06:59:15 -08:00
oobabooga 3bbf6c601d AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this) 2023-12-15 06:46:13 -08:00
oobabooga 7de10f4c8e Bump AutoGPTQ to 0.6.0 (adds Mixtral support) 2023-12-15 06:18:49 -08:00
oobabooga d0677caf2c Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-12-15 04:51:41 -08:00
oobabooga 69ba3cb0d9 Bump openai-whisper requirement (closes #4848) 2023-12-15 04:48:04 -08:00
Song Fuchang 127c71a22a
Update IPEX to 2.1.10+xpu (#4931)
* This will require Intel oneAPI Toolkit 2024.0
2023-12-15 03:19:01 -03:00
oobabooga 85816898f9
Bump llama-cpp-python to 0.2.23 (including Linux ROCm and MacOS >= 12) (#4930) 2023-12-15 01:58:08 -03:00
oobabooga 2cb5b68ad9
Bug fix: when generation fails, save the sent message (#4915) 2023-12-15 01:01:45 -03:00
Felipe Ferreira 11f082e417
[OpenAI Extension] Add more types to Embeddings Endpoint (#4895) 2023-12-15 00:26:16 -03:00
Kim Jaewon e53f99faa0
[OpenAI Extension] Add 'max_logits' parameter in logits endpoint (#4916) 2023-12-15 00:22:43 -03:00
oobabooga eaa1fe67f3
Remove elevenlabs extension (#4928) 2023-12-15 00:00:07 -03:00
oobabooga c3e0fcfc52
Merge pull request #4927 from oobabooga/dev
Merge dev branch
2023-12-14 22:39:08 -03:00
oobabooga f336f8a811 Merge branch 'main' into dev 2023-12-14 17:38:16 -08:00
oobabooga dde7921057 One-click installer: minor message change 2023-12-14 17:27:32 -08:00
oobabooga fd1449de20 One-click installer: fix minor bug introduced in previous commit 2023-12-14 16:52:44 -08:00
oobabooga 4ae2dcebf5 One-click installer: more friendly progress messages 2023-12-14 16:48:00 -08:00
oobabooga 8acecf3aee Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal) (#4924) 2023-12-14 09:41:36 -08:00
oobabooga 8835ea3704
Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal) (#4924) 2023-12-14 14:39:43 -03:00
oobabooga bf68d4499e
Merge pull request #4923 from oobabooga/dev
Merge dev branch
2023-12-14 13:01:05 -03:00
oobabooga 623c92792a Update README 2023-12-14 07:56:48 -08:00
oobabooga 3580bed041 Update README 2023-12-14 07:54:20 -08:00
oobabooga e91c09b8af
Merge pull request #4920 from oobabooga/dev
Merge dev branch
2023-12-14 11:24:00 -03:00
oobabooga d5ec3c3444 Update README 2023-12-14 06:20:52 -08:00
oobabooga 5b283fff22 Update README 2023-12-14 06:15:14 -08:00
oobabooga 958799221f Update README 2023-12-14 06:09:03 -08:00
oobabooga e7fa17740a Update README 2023-12-13 22:49:42 -08:00
oobabooga 03babe7d81 Update README 2023-12-13 22:47:08 -08:00
oobabooga aad14174e4 Update README 2023-12-13 22:46:18 -08:00
oobabooga 783947a2aa Update README 2023-12-13 22:44:25 -08:00
oobabooga 7fef16950f Update README 2023-12-13 22:42:54 -08:00
oobabooga d36e7f1762 Update README 2023-12-13 22:35:22 -08:00
oobabooga 9695db0ee4 Update README 2023-12-13 22:30:31 -08:00
oobabooga d354f5009c Update README 2023-12-13 22:21:29 -08:00
oobabooga 0a4fad2d46 Update README 2023-12-13 22:20:37 -08:00
oobabooga fade6abfe9 Update README 2023-12-13 22:18:40 -08:00
oobabooga aafd15109d Update README 2023-12-13 22:15:58 -08:00
oobabooga 634518a412 Update README 2023-12-13 22:08:41 -08:00
oobabooga 0d5ca05ab9 Update README 2023-12-13 22:06:04 -08:00
oobabooga d241de86c4 Update README 2023-12-13 22:02:26 -08:00
Lounger 5754f0c357
Fix deleting chat logs (#4914) 2023-12-13 21:54:43 -03:00
Bartowski f51156705d
Allow symlinked folder within root directory (#4863) 2023-12-13 18:08:21 -03:00
oobabooga 36e850fe89
Update README.md 2023-12-13 17:55:41 -03:00
oobabooga 3e0c11a758
Merge pull request #4912 from oobabooga/dev
Merge dev branch
2023-12-13 15:49:36 -03:00
oobabooga 1bfee1d12e Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-12-13 10:48:34 -08:00
oobabooga d14d4cad4a Lint 2023-12-13 10:48:15 -08:00
Ixion 3f3960dbfb
Fixed invalid Jinja2 syntax in instruction templates (#4911) 2023-12-13 15:46:23 -03:00
oobabooga 4eeac70af7 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-12-13 10:40:41 -08:00
oobabooga fcf5512364 Jinja templates: fix a potential small bug 2023-12-13 10:19:39 -08:00
missionfloyd bdcc769e6f
Bypass coqui TTS EULA check (#4905) 2023-12-13 02:26:46 -03:00
oobabooga 7f1a6a70e3 Update the llamacpp_HF comment 2023-12-12 21:04:20 -08:00
oobabooga 314a095c74
Merge pull request #4903 from oobabooga/dev
Merge dev branch
2023-12-12 23:10:45 -03:00
oobabooga c2802bc3ac Lint 2023-12-12 18:05:10 -08:00
oobabooga b2cae6cac6 Docs: minor update 2023-12-12 14:11:13 -08:00
oobabooga 21a5bfc67f Relax optimum requirement 2023-12-12 14:05:58 -08:00
oobabooga 12f58e2cac Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-12-12 13:28:24 -08:00
oobabooga 1c531a3713 Minor cleanup 2023-12-12 13:25:21 -08:00
Penagwin 85a1d8965c
Updated Docker Docs (#4900) 2023-12-12 18:03:50 -03:00
oobabooga 8513028968 Fix lag in the chat tab during streaming 2023-12-12 13:01:25 -08:00
oobabooga 736fe4aa3e Fix server refusing to close on Ctrl+C 2023-12-12 12:27:40 -08:00
oobabooga 39d2fe1ed9
Jinja templates for Instruct and Chat (#4874) 2023-12-12 17:23:14 -03:00
oobabooga aab0dd962d Revert "Update callbacks.py to show tracebacks on ValueError (#4892)"
This reverts commit 993ca51a65.
2023-12-12 11:47:11 -08:00
dependabot[bot] 7a987417bb
Bump optimum from 1.14.0 to 1.15.0 (#4885) 2023-12-12 02:32:19 -03:00
dependabot[bot] a17750db91
Update peft requirement from ==0.6.* to ==0.7.* (#4886) 2023-12-12 02:31:30 -03:00
dependabot[bot] a8a92c6c87
Update transformers requirement from ==4.35.* to ==4.36.* (#4882) 2023-12-12 02:30:25 -03:00
Nehereus 993ca51a65
Update callbacks.py to show tracebacks on ValueError (#4892) 2023-12-12 02:29:27 -03:00
Morgan Schweers 602b8c6210
Make new browser reloads recognize current model. (#4865) 2023-12-11 02:51:01 -03:00
oobabooga 8c8825b777 Add QuIP# to README 2023-12-08 08:40:42 -08:00
oobabooga 2a335b8aa7 Cleanup: set shared.model_name only once 2023-12-08 06:35:23 -08:00
oobabooga 62d59a516f Add trust_remote_code to all HF loaders 2023-12-08 06:29:26 -08:00
oobabooga 705f04a0c9
Merge pull request #4851 from oobabooga/dev
Merge dev branch
2023-12-08 10:25:57 -03:00
oobabooga 181743fd97 Fix missing spaces tokenizer issue (closes #4834) 2023-12-08 05:16:46 -08:00
oobabooga 884871c107
Merge pull request #4849 from oobabooga/dev
Merge dev branch
2023-12-08 10:05:02 -03:00
oobabooga 00aedf9209 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-12-08 05:02:25 -08:00
oobabooga 7bbe7e803a Minor fix 2023-12-08 05:01:25 -08:00
Yiximail 1c74b3ab45
Fix partial unicode characters issue (#4837) 2023-12-08 09:50:53 -03:00
oobabooga 2c5a1e67f9
Parameters: change max_new_tokens & repetition_penalty_range defaults (#4842) 2023-12-07 20:04:52 -03:00
Song Fuchang e16e5997ef
Update IPEX install URL. (#4825)
* Old pip url no longer works. Use the latest url from
  * https://intel.github.io/intel-extension-for-pytorch/index.html#installation
2023-12-06 21:07:01 -03:00
oobabooga d516815c9c Model downloader: download only fp16 if both fp16 and GGUF are present 2023-12-05 21:09:12 -08:00
oobabooga 98361af4d5
Add QuIP# support (#4803)
It has to be installed manually for now.
2023-12-06 00:01:01 -03:00
oobabooga 6430acadde Minor bug fix after https://github.com/oobabooga/text-generation-webui/pull/4814 2023-12-05 10:08:11 -08:00
oobabooga c21a9668a5 Lint 2023-12-04 21:17:05 -08:00
erew123 f786aa3caa
Clean-up Ctrl+C Shutdown (#4802) 2023-12-05 02:16:16 -03:00
oobabooga 2694ef45a3 Do not limit API updates/second 2023-12-04 20:46:18 -08:00
oobabooga 0f828ea441 Do not limit API updates/second 2023-12-04 20:45:43 -08:00
oobabooga af261e5dd4
Merge pull request #4815 from oobabooga/dev
Merge dev branch
2023-12-05 01:30:57 -03:00
oobabooga 9edb193def
Optimize HF text generation (#4814) 2023-12-05 00:00:40 -03:00
oobabooga 1ccbcb967e
Merge pull request #4811 from oobabooga/dev
Merge dev branch
2023-12-04 21:29:45 -03:00
俞航 ac9f154bcc
Bump exllamav2 from 0.0.8 to 0.0.10 & Fix code change (#4782) 2023-12-04 21:15:05 -03:00
oobabooga 131a5212ce UI: update context upper limit to 200000 2023-12-04 15:48:34 -08:00
oobabooga f7145544f9 Update README 2023-12-04 15:44:44 -08:00
oobabooga 8e1f86a866 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-12-04 15:41:56 -08:00
oobabooga be88b072e9 Update --loader flag description 2023-12-04 15:41:25 -08:00
dependabot[bot] 801ba87c68
Update accelerate requirement from ==0.24.* to ==0.25.* (#4810) 2023-12-04 20:36:01 -03:00
oobabooga 7fc9033b2e Recommend ExLlama_HF and ExLlamav2_HF 2023-12-04 15:28:46 -08:00
oobabooga e4e35f357b
Merge pull request #4807 from oobabooga/dev
Merge dev branch
2023-12-04 12:28:34 -03:00
oobabooga 3f993280e4 Minor changes 2023-12-04 07:27:44 -08:00
oobabooga 0931ed501b Minor changes 2023-12-04 07:25:18 -08:00
oobabooga 427a165597 Bump TTS version in coqui_tts 2023-12-04 07:21:56 -08:00
Song Fuchang 0bfd5090be
Import accelerate very early to make Intel GPU happy (#4704) 2023-12-03 22:51:18 -03:00
dependabot[bot] 2e83844f35
Bump safetensors from 0.4.0 to 0.4.1 (#4750) 2023-12-03 22:50:10 -03:00
Ikko Eltociear Ashimine 06cc9a85f7
README: minor typo fix (#4793) 2023-12-03 22:46:34 -03:00
Lounger 7c0a17962d
Gallery improvements (#4789) 2023-12-03 22:45:50 -03:00
oobabooga 96df4f10b9
Merge pull request #4777 from oobabooga/dev
Merge dev branch
2023-12-01 00:00:17 -03:00
oobabooga 77d6ccf12b Add a LOADER debug message while loading models 2023-11-30 12:00:32 -08:00
oobabooga 1c90e02243 Update Colab-TextGen-GPU.ipynb 2023-11-30 11:55:18 -08:00
oobabooga 092a2c3516 Fix a bug in llama.cpp get_logits() function 2023-11-30 11:21:40 -08:00
oobabooga 6d3a9b8689
Merge pull request #4773 from oobabooga/dev
Merge dev branch
2023-11-30 02:31:37 -03:00
oobabooga 000b77a17d Minor docker changes 2023-11-29 21:27:23 -08:00
Callum 88620c6b39
feature/docker_improvements (#4768) 2023-11-30 02:20:23 -03:00
oobabooga 2698d7c9fd Fix llama.cpp model unloading 2023-11-29 15:19:48 -08:00
oobabooga fa89d305e3 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-11-29 15:13:17 -08:00
oobabooga 9940ed9c77 Sort the loaders 2023-11-29 15:13:03 -08:00
Manu Kashyap 78fd7f6aa8
Fixed naming for sentence-transformers library (#4764) 2023-11-29 12:15:03 -03:00
oobabooga a7670c31ca Sort 2023-11-28 18:43:33 -08:00
oobabooga 6e51bae2e0 Sort the loaders menu 2023-11-28 18:41:11 -08:00
oobabooga f4b956b47c Detect yi instruction template 2023-11-27 10:45:47 -08:00
oobabooga 68059d7c23 llama.cpp: minor log change & lint 2023-11-27 10:44:55 -08:00
Denis Iskandarov 1b05832f9a
Add direnv artifacts to gitignore (#4737) 2023-11-27 15:43:42 -03:00
xr4dsh b5b3d18773
resonable cli args for docker container (#4727) 2023-11-27 15:43:01 -03:00
tsukanov-as 9f7ae6bb2e
fix detection of stopping strings when HTML escaping is used (#4728) 2023-11-27 15:42:08 -03:00
Eve d06ce7b75c
add openhermes mistral support (#4730) 2023-11-27 15:41:06 -03:00
oobabooga b6d16a35b1 Minor API fix 2023-11-21 17:56:28 -08:00
oobabooga 51add248c8
Merge pull request #4702 from oobabooga/dev
Merge dev branch
2023-11-21 21:18:27 -03:00
oobabooga cb0dbffccc Merge branch 'main' into dev 2023-11-21 16:12:45 -08:00
oobabooga 8d811a4d58 one-click: move on instead of crashing if extension fails to install 2023-11-21 16:09:44 -08:00
oobabooga 0589ff5b12
Bump llama-cpp-python to 0.2.19 & add min_p and typical_p parameters to llama.cpp loader (#4701) 2023-11-21 20:59:39 -03:00
oobabooga 2769a1fa25 Hide deprecated args from Session tab 2023-11-21 15:15:16 -08:00
oobabooga 0047d9f5e0 Do not install coqui_tts requirements by default
It breaks the one-click installer on Windows.
2023-11-21 15:13:42 -08:00
oobabooga fb124ab6e2 Bump to flash-attention 2.3.4 + switch to Github Actions wheels on Windows (#4700) 2023-11-21 15:07:17 -08:00
oobabooga e9cdaa2ada
Bump to flash-attention 2.3.4 + switch to Github Actions wheels on Windows (#4700) 2023-11-21 20:06:56 -03:00
oobabooga b81d6ad8a4
Detect Orca 2 template (#4697) 2023-11-21 15:26:42 -03:00
oobabooga 360eeb9ff1
Merge pull request #4686 from oobabooga/dev
Merge dev branch
2023-11-21 08:38:50 -03:00
oobabooga 54a4eb60a3
Remove --no-dependencies from TTS installation command 2023-11-21 08:30:50 -03:00
oobabooga efdd99623c
Merge pull request #4683 from oobabooga/dev
Merge dev branch
2023-11-21 00:36:58 -03:00
oobabooga b02dc4dc0d Add --no-dependencies to TTS installation command 2023-11-20 19:02:12 -08:00
oobabooga 55f2a3643b Update multimodal API example 2023-11-20 18:41:09 -08:00
oobabooga 829c6d4f78 Add "remove_trailing_dots" option to XTTSv2 2023-11-20 18:33:29 -08:00
kanttouchthis 8dc9ec3491
add XTTSv2 (coqui_tts extension) (#4673)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-11-20 22:37:52 -03:00
oobabooga ff24648510 Credit llama-cpp-python in the README 2023-11-20 12:13:15 -08:00
oobabooga be78d79811 Revert accidental noavx2 changes 2023-11-20 11:48:04 -08:00
oobabooga 4b84e45116 Use +cpuavx2 instead of +cpuavx 2023-11-20 11:46:38 -08:00
oobabooga d7f1bc102b
Fix "Illegal instruction" bug in llama.cpp CPU only version (#4677) 2023-11-20 16:36:38 -03:00
drew9781 5e70263e25
docker: install xformers with sepcific cuda version, matching the docker image. (#4670) 2023-11-19 21:43:15 -03:00
oobabooga f11092ac2a
Merge pull request #4664 from oobabooga/dev
Merge dev branch
2023-11-19 15:12:55 -03:00
oobabooga f0d66cf817 Add missing file 2023-11-19 10:12:13 -08:00
oobabooga 22e7a22d1e
Merge pull request #4662 from oobabooga/dev
Merge dev branch
2023-11-19 14:23:19 -03:00
oobabooga a2e6d00128 Use convert_ids_to_tokens instead of decode in logits endpoint
This preserves the llama tokenizer spaces.
2023-11-19 09:22:08 -08:00
oobabooga d1bba48a83
Merge pull request #4660 from oobabooga/dev
Merge dev branch
2023-11-19 13:32:08 -03:00
oobabooga 8cf05c1b31 Fix disappearing character gallery 2023-11-19 08:31:01 -08:00
oobabooga 9da7bb203d Minor LoRA bug fix 2023-11-19 07:59:29 -08:00
oobabooga 78af3b0a00 Update docs/What Works.md 2023-11-19 07:57:16 -08:00
oobabooga a6f1e1bcc5 Fix PEFT LoRA unloading 2023-11-19 07:55:25 -08:00
oobabooga a290d17386 Add hover cursor to bot pfp 2023-11-19 06:56:42 -08:00
oobabooga ab94f0d9bf Minor style change 2023-11-18 21:11:04 -08:00
oobabooga 5fcee696ea
New feature: enlarge character pictures on click (#4654) 2023-11-19 02:05:17 -03:00
Jordan Tucker cb836dd49c
fix: use shared chat-instruct_command with api (#4653) 2023-11-19 01:19:10 -03:00
oobabooga 771e62e476
Add /v1/internal/lora endpoints (#4652) 2023-11-19 00:35:22 -03:00
oobabooga ef6feedeb2
Add --nowebui flag for pure API mode (#4651) 2023-11-18 23:38:39 -03:00
oobabooga 0fa1af296c
Add /v1/internal/logits endpoint (#4650) 2023-11-18 23:19:31 -03:00
oobabooga 8f4f4daf8b
Add --admin-key flag for API (#4649) 2023-11-18 22:33:27 -03:00
wizd af76fbedb8
Openai embedding fix to support jina-embeddings-v2 (#4642) 2023-11-18 20:24:29 -03:00
Jordan Tucker baab894759
fix: use system message in chat-instruct mode (#4648) 2023-11-18 20:20:13 -03:00
oobabooga 47d9e2618b Refresh the Preset menu after saving a preset 2023-11-18 14:03:42 -08:00
oobabooga 83b64e7fc1
New feature: "random preset" button (#4647) 2023-11-18 18:31:41 -03:00
oobabooga d1a58da52f Update ancient Docker instructions 2023-11-17 19:52:53 -08:00
oobabooga e0ca49ed9c
Bump llama-cpp-python to 0.2.18 (2nd attempt) (#4637)
* Update requirements*.txt

* Add back seed
2023-11-18 00:31:27 -03:00
oobabooga 3146124ec0
Merge pull request #4632 from oobabooga/dev
Merge dev branch
2023-11-17 10:18:31 -03:00
oobabooga 9d6f79db74 Revert "Bump llama-cpp-python to 0.2.18 (#4611)"
This reverts commit 923c8e25fb.
2023-11-17 05:14:25 -08:00
oobabooga e0a7cc5e0f Simplify CORS code 2023-11-16 20:11:55 -08:00
oobabooga 13dc3b61da Update README 2023-11-16 19:57:55 -08:00
oobabooga 8b66d83aa9 Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
2023-11-16 19:55:28 -08:00
oobabooga f889302d24
Merge pull request #4628 from oobabooga/dev
Merge dev branch
2023-11-16 23:47:07 -03:00
oobabooga b2ce8dc7ee Update a message 2023-11-16 18:46:26 -08:00
oobabooga 0ee8d2b66b
Merge pull request #4627 from oobabooga/dev
Merge dev branch
2023-11-16 23:41:18 -03:00
oobabooga 780b00e1cf Minor bug fix 2023-11-16 18:39:39 -08:00
oobabooga c0233bb9d3 Minor message change 2023-11-16 18:36:57 -08:00
oobabooga 94b7177174 Update docs/07 - Extensions 2023-11-16 18:24:46 -08:00
oobabooga 6525707a7f Fix "send instruction template to..." buttons (closes #4625) 2023-11-16 18:16:42 -08:00
oobabooga 510a01ef46 Lint 2023-11-16 18:03:06 -08:00
oobabooga 923c8e25fb
Bump llama-cpp-python to 0.2.18 (#4611) 2023-11-16 22:55:14 -03:00
Casper 61f429563e
Bump AutoAWQ to 0.1.7 (#4620) 2023-11-16 17:08:08 -03:00
oobabooga e7d460d932 Make sure that API requirements are installed 2023-11-16 10:08:41 -08:00
oobabooga cbf2b47476 Strip trailing "\" characters in CMD_FLAGS.txt 2023-11-16 09:33:36 -08:00
oobabooga 58c6001be9 Add missing exllamav2 samplers 2023-11-16 07:09:40 -08:00
oobabooga cd41f8912b Warn users about n_ctx / max_seq_len 2023-11-15 18:56:42 -08:00
oobabooga a475aa7816 Improve API documentation 2023-11-15 18:39:08 -08:00
oobabooga 9be48e83a9 Start API when "api" checkbox is checked 2023-11-15 16:35:47 -08:00
oobabooga a85ce5f055 Add more info messages for truncation / instruction template 2023-11-15 16:20:31 -08:00
oobabooga 883701bc40 Alternative solution to 025da386a0
Fixes an error.
2023-11-15 16:04:02 -08:00
oobabooga 8ac942813c Revert "Fix CPU memory limit error (issue #3763) (#4597)"
This reverts commit 025da386a0.
2023-11-15 16:01:54 -08:00
oobabooga e6f44d6d19 Print context length / instruction template to terminal when loading models 2023-11-15 16:00:51 -08:00
oobabooga e05d8fd441 Style changes 2023-11-15 15:51:37 -08:00
oobabooga be125e2708 Add /v1/internal/model/unload endpoint 2023-11-15 15:48:33 -08:00
David Nielson 564d0cde82
Use standard hyphens in filenames (#4576) 2023-11-15 20:29:00 -03:00
Andy Bao 025da386a0
Fix CPU memory limit error (issue #3763) (#4597)
get_max_memory_dict() was not properly formatting shared.args.cpu_memory

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-11-15 20:27:20 -03:00
Anton Rogozin 8a9d5a0cea
update AutoGPTQ to higher version for lora applying error fixing (#4604) 2023-11-15 20:23:22 -03:00
oobabooga 8a2af87d3a
Merge pull request #4608 from oobabooga/dev
Merge dev branch
2023-11-15 13:19:15 -03:00
oobabooga 072cfe19e9 Minor Colab fix 2023-11-15 08:18:32 -08:00
oobabooga 2337aebe4d
Merge pull request #4606 from oobabooga/dev
Merge dev branch
2023-11-15 13:16:44 -03:00
oobabooga 3d861a459d Minor Colab fix 2023-11-15 08:15:43 -08:00
oobabooga dea90c7b67 Bump exllamav2 to 0.0.8 2023-11-13 10:34:10 -08:00
oobabooga 454fcf39a9
Merge pull request #4579 from oobabooga/dev
Merge dev branch
2023-11-13 11:39:08 -03:00
oobabooga 4f9bc63edf Installer: update a message for clarity 2023-11-10 09:43:02 -08:00
oobabooga 74fee4f312 Update Colab-TextGen-GPU.ipynb 2023-11-10 09:18:25 -08:00
oobabooga 52758f15da Remove sentence-transformers requirement (for #1575) 2023-11-10 07:35:29 -08:00
oobabooga c5be3f7acb Make /v1/embeddings functional, add request/response types 2023-11-10 07:34:27 -08:00
oobabooga 7ed2143cd6
Update 12 - OpenAI API.md 2023-11-10 11:56:04 -03:00
oobabooga 0777b0d3c7 Add system_message parameter, document model (unused) parameter 2023-11-10 06:47:18 -08:00
oobabooga 4aabff3728 Remove old API, launch OpenAI API with --api 2023-11-10 06:39:08 -08:00
GuizzyQC 6a7cd01ebf
Fix bug with /internal/model/load (#4549)
Update shared.model_name after loading model through API call
2023-11-10 00:16:38 -03:00
oobabooga 2af7e382b1 Revert "Bump llama-cpp-python to 0.2.14"
This reverts commit 5c3eb22ce6.

The new version has issues:

https://github.com/oobabooga/text-generation-webui/issues/4540
https://github.com/abetlen/llama-cpp-python/issues/893
2023-11-09 10:02:13 -08:00
oobabooga 07d66e45b4
Merge pull request #4541 from oobabooga/dev
Merge dev branch
2023-11-09 14:53:34 -03:00
Ashley Kleynhans 372d712921
Fix deprecated API (#4539) 2023-11-09 14:51:50 -03:00
oobabooga d86f1fd2c3 OpenAI API: stop streaming on client disconnect (closes #4521) 2023-11-09 06:37:32 -08:00
oobabooga f7534b2f4b
Merge pull request #4532 from oobabooga/dev
Merge dev branch
2023-11-09 09:33:55 -03:00
oobabooga effb3aef42 Prevent deadlocks in OpenAI API with simultaneous requests 2023-11-08 20:55:39 -08:00
oobabooga 4da00b6032
Merge pull request #4522 from oobabooga/dev
Merge dev branch
2023-11-08 22:57:08 -03:00
oobabooga 21ed9a260e Document the new "Custom system message" field 2023-11-08 17:54:10 -08:00
oobabooga 678fd73aef Document /v1/internal/model/load and fix a bug 2023-11-08 17:41:12 -08:00
MrMojoR 1754a3761b
Include trust remote code usage in openai api's embedder (#4513) 2023-11-08 11:25:43 -03:00
hronoas 6c7aad11f3
openai extension: wrong frequency_penalty type (#4512) 2023-11-08 11:23:51 -03:00
oobabooga 881e8a6e70
Small bug fix in /v1/internal/model/load 2023-11-08 02:34:13 -03:00
oobabooga 050ff36bd6 Revert "Add a comment to /v1/models"
This reverts commit 38b07493a0.
2023-11-07 21:09:47 -08:00
oobabooga 38b07493a0 Add a comment to /v1/models 2023-11-07 21:07:12 -08:00
oobabooga 2358706453 Add /v1/internal/model/load endpoint (tentative) 2023-11-07 20:58:06 -08:00
oobabooga 43c53a7820 Refactor the /v1/models endpoint 2023-11-07 19:59:27 -08:00
oobabooga 1b69694fe9 Add types to the encode/decode/token-count endpoints 2023-11-07 19:32:14 -08:00
oobabooga f6ca9cfcdc Add /v1/internal/model-info endpoint 2023-11-07 18:59:02 -08:00
oobabooga 6e2e0317af
Separate context and system message in instruction formats (#4499) 2023-11-07 20:02:58 -03:00
oobabooga 322c170566 Document logits_all 2023-11-07 14:45:11 -08:00
oobabooga 5c0559da69 Training: fix .txt files now showing in dropdowns 2023-11-07 14:41:11 -08:00
oobabooga af3d25a503 Disable logits_all in llamacpp_HF (makes processing 3x faster) 2023-11-07 14:35:48 -08:00
oobabooga 5c3eb22ce6 Bump llama-cpp-python to 0.2.14 2023-11-07 14:20:43 -08:00
oobabooga 3fc505dc0f Document unused parameters 2023-11-07 08:56:09 -08:00
oobabooga 3d59346871 Implement echo/suffix parameters 2023-11-07 08:43:45 -08:00
oobabooga cee099f131 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-11-07 08:25:22 -08:00
oobabooga 48c9c31440 Document the "preset" option in the API 2023-11-07 08:23:17 -08:00
oobabooga d59f1ad89a
Update README.md 2023-11-07 13:05:06 -03:00
oobabooga 0c440877de
Update 12 - OpenAI API.md 2023-11-07 12:59:40 -03:00
oobabooga 55dc9845cb
Update 12 - OpenAI API.md 2023-11-07 12:51:41 -03:00
oobabooga b0b999dd68 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-11-07 07:46:08 -08:00
oobabooga 2bda1a9c9b Mention --api-key 2023-11-07 07:45:55 -08:00
oobabooga cc04abda49
Update 12 - OpenAI API.md 2023-11-07 12:40:52 -03:00
oobabooga ddca6948b2
Update 12 - OpenAI API.md 2023-11-07 12:39:59 -03:00
oobabooga 40e73aafce
Update 12 - OpenAI API.md 2023-11-07 12:38:39 -03:00
oobabooga 6ec997f195
Update 12 - OpenAI API.md 2023-11-07 12:36:52 -03:00
oobabooga 15d4ea180d Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-11-07 07:35:36 -08:00
oobabooga b2afdda4e8 Add more API examples 2023-11-07 07:35:04 -08:00
Morgan Cheng 349604458b
Update 12 - OpenAI API.md (#4501)
Fix the typo in argument. It should be `--api-port` instead of `--port`.

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-11-07 11:22:17 -03:00
dependabot[bot] fd893baba1
Bump optimum from 1.13.1 to 1.14.0 (#4492) 2023-11-07 00:13:41 -03:00
dependabot[bot] 18739c8b3a
Update peft requirement from ==0.5.* to ==0.6.* (#4494) 2023-11-07 00:12:59 -03:00
oobabooga 79b3f5a546
Add /v1/internal/stop-generation to OpenAI API (#4498) 2023-11-07 00:10:42 -03:00
oobabooga 97c21e5667 Don't strip leading spaces in OpenAI API 2023-11-06 19:09:41 -08:00
oobabooga 4a45dc4041 Reorder the parameters in the FastAPI documentation 2023-11-06 09:55:36 -08:00
oobabooga 1fba6db69f
Merge pull request #4488 from oobabooga/dev
Merge dev branch
2023-11-06 12:18:55 -03:00
oobabooga 0ed6a17ed4 Update warning 2023-11-06 07:17:49 -08:00
oobabooga 0db81355bc Reorder a parameter 2023-11-06 07:11:49 -08:00
oobabooga b87c6213ae Remove obsolete endpoint 2023-11-06 05:45:45 -08:00
oobabooga fcc9114b58 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-11-06 05:38:47 -08:00
oobabooga ceb8c92dfc
Update 12 - OpenAI API.md 2023-11-06 10:38:22 -03:00
oobabooga 28fd535f9c Make chat API more robust 2023-11-06 05:22:01 -08:00
oobabooga 5b5ef57049 Remove file 2023-11-05 21:39:59 -08:00
oobabooga ec17a5d2b7
Make OpenAI API the default API (#4430) 2023-11-06 02:38:29 -03:00
俞航 84d957ba62
[Fix] fix openai embedding_model loading as str (#4147) 2023-11-05 20:42:45 -03:00
kabachuha e18a0460d4
fix openai extension not working because of absent new defaults (#4477) 2023-11-04 16:12:51 -03:00
oobabooga b7a409ef57
Merge pull request #4476 from oobabooga/dev
Merge dev branch
2023-11-04 15:04:43 -03:00
oobabooga fb3bd0203d Update docs 2023-11-04 11:02:24 -07:00
oobabooga 1d8c7c1fc4 Update docs 2023-11-04 11:01:15 -07:00
oobabooga b5c53041b8
Merge pull request #4475 from oobabooga/dev
Merge dev branch
2023-11-04 14:19:55 -03:00
oobabooga 40f7f37009 Update requirements 2023-11-04 10:12:06 -07:00
Orang 2081f43ac2
Bump transformers to 4.35.* (#4474) 2023-11-04 14:00:24 -03:00
feng lui 4766a57352
transformers: add use_flash_attention_2 option (#4373) 2023-11-04 13:59:33 -03:00
wouter van der plas add359379e
fixed two links in the ui (#4452) 2023-11-04 13:41:42 -03:00
Casper cfbd108826
Bump AWQ to 0.1.6 (#4470) 2023-11-04 13:09:41 -03:00
oobabooga aa5d671579
Add temperature_last parameter (#4472) 2023-11-04 13:09:07 -03:00
oobabooga 1ab8700d94 Change frequency/presence penalty ranges 2023-11-03 17:38:19 -07:00
oobabooga 45fcb60e7a Make truncation_length_max apply to max_seq_len/n_ctx 2023-11-03 11:29:31 -07:00
oobabooga 7f9c1cbb30 Change min_p default to 0.0 2023-11-03 08:25:22 -07:00
oobabooga 4537853e2c Change min_p default to 1.0 2023-11-03 08:13:50 -07:00
kalomaze 367e5e6e43
Implement Min P as a sampler option in HF loaders (#4449) 2023-11-02 16:32:51 -03:00
oobabooga fcb7017b7a Remove a checkbox 2023-11-02 12:24:09 -07:00
Julien Chaumond fdcaa955e3
transformers: Add a flag to force load from safetensors (#4450) 2023-11-02 16:20:54 -03:00
oobabooga c0655475ae Add cache_8bit option 2023-11-02 11:23:04 -07:00
oobabooga 42f816312d Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2023-11-02 11:09:26 -07:00
oobabooga 77abd9b69b Add no_flash_attn option 2023-11-02 11:08:53 -07:00
Julien Chaumond a56ef2a942
make torch.load a bit safer (#4448) 2023-11-02 14:07:08 -03:00
deevis deba039c03
(fix): OpenOrca-Platypus2 models should use correct instruction_template and custom_stopping_strings (#4435) 2023-11-01 01:51:00 -03:00
Mehran Ziadloo aaf726dbfb
Updating the shared settings object when loading a model (#4425) 2023-11-01 01:29:57 -03:00
oobabooga 9bd0724d85 Change frequency/presence penalty ranges 2023-10-31 20:57:56 -07:00
Orang 6b7fa45cc3
Update exllamav2 version (#4417) 2023-10-31 19:12:14 -03:00
Casper 41e159e88f
Bump AutoAWQ to v0.1.5 (#4410) 2023-10-31 19:11:22 -03:00
Meheret 0707ed7677
updated wiki link (#4415) 2023-10-31 19:09:05 -03:00
oobabooga 262f8ae5bb Use default gr.Dataframe for evaluation table 2023-10-27 06:49:14 -07:00
James Braza f481ce3dd8
Adding platform_system to autoawq (#4390) 2023-10-27 01:02:28 -03:00
dependabot[bot] af98587580
Update accelerate requirement from ==0.23.* to ==0.24.* (#4400) 2023-10-27 00:46:16 -03:00
oobabooga 839a87bac8 Fix is_ccl_available & is_xpu_available imports 2023-10-26 20:27:04 -07:00
Abhilash Majumder 778a010df8
Intel Gpu support initialization (#4340) 2023-10-26 23:39:51 -03:00
GuizzyQC 317e2c857e
sd_api_pictures: fix Gradio warning message regarding custom value (#4391) 2023-10-26 23:03:21 -03:00
oobabooga 92b2f57095 Minor metadata bug fix (second attempt) 2023-10-26 18:57:32 -07:00
oobabooga 2d97897a25 Don't install flash-attention on windows + cuda 11 2023-10-25 11:21:18 -07:00
LightningDragon 0ced78fdfa
Replace hashlib.sha256 with hashlib.file_digest so we don't need to load entire files into ram before hashing them. (#4383) 2023-10-25 12:15:34 -03:00
tdrussell 72f6fc6923
Rename additive_repetition_penalty to presence_penalty, add frequency_penalty (#4376) 2023-10-25 12:10:28 -03:00
oobabooga ef1489cd4d Remove unused parameter in AutoAWQ 2023-10-23 20:45:43 -07:00
oobabooga 1edf321362 Lint 2023-10-23 13:09:03 -07:00
oobabooga 280ae720d7 Organize 2023-10-23 13:07:17 -07:00
oobabooga 49e5eecce4 Merge remote-tracking branch 'refs/remotes/origin/main' 2023-10-23 12:54:05 -07:00
oobabooga 82c11be067 Update 04 - Model Tab.md 2023-10-23 12:49:07 -07:00
oobabooga 306d764ff6 Minor metadata bug fix 2023-10-23 12:46:24 -07:00
adrianfiedler 4bc411332f
Fix broken links (#4367)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-10-23 14:09:57 -03:00
oobabooga 92691ee626 Disable trust_remote_code by default 2023-10-23 09:57:44 -07:00
tdrussell 4440f87722
Add additive_repetition_penalty sampler setting. (#3627) 2023-10-23 02:28:07 -03:00
oobabooga 6086768309 Bump gradio to 3.50.* 2023-10-22 21:21:26 -07:00
oobabooga b8183148cf
Update 04 ‐ Model Tab.md 2023-10-22 17:15:55 -03:00
oobabooga cea7fc2435 Update html_instruct_style.css 2023-10-22 12:28:23 -07:00
oobabooga df90d03e0b Replace --mul_mat_q with --no_mul_mat_q 2023-10-22 12:23:03 -07:00
Googulator d0c3b407b3
transformers loader: multi-LoRAs support (#3120) 2023-10-22 16:06:22 -03:00
omo 4405513ca5
Option to select/target additional linear modules/layers in LORA training (#4178) 2023-10-22 15:57:19 -03:00
oobabooga 7a3f885ea8
Update 03 ‐ Parameters Tab.md 2023-10-22 14:52:23 -03:00
oobabooga 63688004dc Add default cmd flags to colab 2023-10-22 09:56:43 -07:00
oobabooga 613feca23b Make colab functional for llama.cpp
- Download only Q4_K_M for GGUF repositories by default
- Use maximum n-gpu-layers by default
2023-10-22 09:08:25 -07:00
oobabooga 994502d41b Colab fixes 2023-10-22 08:57:16 -07:00
Jiashu Xu c544f5cc51
Support LLaVA v1.5 7B (#4348) 2023-10-22 12:49:04 -03:00
oobabooga 05741821a5 Minor colab changes 2023-10-22 08:44:35 -07:00
FartyPants (FP HAM) 6a61158adf
Training PRO a month worth of updates (#4345) 2023-10-22 12:38:09 -03:00
mongolu c18504f369
USE_CUDA118 from ENV remains null one_click.py + cuda-toolkit (#4352) 2023-10-22 12:37:24 -03:00
oobabooga cd45635f53 tqdm improvement for colab 2023-10-21 22:00:29 -07:00
oobabooga ae79c510cc Merge remote-tracking branch 'refs/remotes/origin/main' 2023-10-21 21:46:15 -07:00
oobabooga 2d1b3332e4 Ignore warnings on Colab 2023-10-21 21:45:25 -07:00
oobabooga caf6db07ad
Update README.md 2023-10-22 01:22:17 -03:00
oobabooga 1a34927314 Make API URLs more visible 2023-10-21 21:11:07 -07:00
oobabooga 09f807af83 Use ExLlama_HF for GPTQ models by default 2023-10-21 20:45:38 -07:00
oobabooga 619093483e Add Colab notebook 2023-10-21 20:27:52 -07:00
oobabooga 506d05aede Organize command-line arguments 2023-10-21 18:52:59 -07:00
oobabooga b1f33b55fd
Update 01 ‐ Chat Tab.md 2023-10-21 20:17:56 -03:00
oobabooga ac6d5d50b7
Update README.md 2023-10-21 20:03:43 -03:00
oobabooga 6efb990b60
Add a proper documentation (#3885) 2023-10-21 19:15:54 -03:00
Adam White 5a5bc135e9
Docker: Remove explicit CUDA 11.8 Reference (#4343) 2023-10-21 15:09:34 -03:00
oobabooga b98fbe0afc Add download link 2023-10-20 23:58:05 -07:00
oobabooga fbac6d21ca Add missing exception 2023-10-20 23:53:24 -07:00
Brian Dashore 3345da2ea4
Add flash-attention 2 for windows (#4235) 2023-10-21 03:46:23 -03:00
oobabooga 258d046218 More robust way of initializing empty .git folder 2023-10-20 23:13:09 -07:00
Johan 1d5a015ce7
Enable special token support for exllamav2 (#4314) 2023-10-21 01:54:06 -03:00
mjbogusz 8f6405d2fa
Python 3.11, 3.9, 3.8 support (#4233)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-10-20 21:13:33 -03:00
oobabooga 9be74fb57c Change 2 margins 2023-10-20 14:04:14 -07:00
oobabooga e208128d68 Lint the CSS files 2023-10-20 13:02:18 -07:00
oobabooga dedbdb46c2 Chat CSS improvements 2023-10-20 12:49:36 -07:00
Haotian Liu 32984ea2f0
Support LLaVA v1.5 (#4305) 2023-10-20 02:28:14 -03:00
oobabooga bb71272903 Detect WizardCoder-Python-34B & Phind-CodeLlama-34B 2023-10-19 14:35:56 -07:00
oobabooga eda7126b25 Organize the .gitignore 2023-10-19 14:33:44 -07:00
turboderp ae8cd449ae
ExLlamav2_HF: Convert logits to FP32 (#4310) 2023-10-18 23:16:05 -03:00
missionfloyd c0ffb77fd8
More silero languages (#3950) 2023-10-16 17:12:32 -03:00
hronoas db7ecdd274
openai: fix empty models list on query present in url (#4139) 2023-10-16 17:02:47 -03:00
oobabooga f17f7a6913 Increase the evaluation table height 2023-10-16 12:55:35 -07:00
oobabooga 8ea554bc19 Check for torch.xpu.is_available() 2023-10-16 12:53:40 -07:00
oobabooga 188d20e9e5 Reduce the evaluation table height 2023-10-16 10:53:42 -07:00
oobabooga 2d44adbb76 Clear the torch cache while evaluating 2023-10-16 10:52:50 -07:00
oobabooga 388d1864a6 Merge remote-tracking branch 'refs/remotes/origin/main' 2023-10-15 21:58:16 -07:00
oobabooga 71cac7a1b2 Increase the height of the evaluation table 2023-10-15 21:56:40 -07:00
oobabooga e14bde4946 Minor improvements to evaluation logs 2023-10-15 20:51:43 -07:00
oobabooga b88b2b74a6 Experimental Intel Arc transformers support (untested) 2023-10-15 20:51:11 -07:00
Sam d331501ebc
Fix for using Torch with CUDA 11.8 (#4298) 2023-10-15 19:27:19 -03:00
oobabooga 3bb4046fad
Update auto-release.yml 2023-10-15 17:27:16 -03:00
oobabooga 45fa803943
Create auto-release.yml 2023-10-15 17:25:29 -03:00
Johan 2706394bfe
Relax numpy version requirements (#4291) 2023-10-15 12:05:06 -03:00
Forkoz 8cce1f1126
Exllamav2 lora support (#4229)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-10-14 16:12:41 -03:00
jllllll 1f5a2c5597
Use Pytorch 2.1 exllama wheels (#4285) 2023-10-14 15:27:59 -03:00
oobabooga cd1cad1b47 Bump exllamav2 2023-10-14 11:23:07 -07:00
Eve 6e2dec82f1
add chatml support + mistral-openorca (#4275) 2023-10-13 11:49:17 -03:00
Jesus Alvarez ed66ca3cdf
Add HTTPS support to APIs (openai and default) (#4270)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-10-13 01:31:13 -03:00
oobabooga 43be1be598 Manually install CUDA runtime libraries 2023-10-12 21:02:44 -07:00
oobabooga faf5c4dd58 Fix code blocks in instruct mode 2023-10-11 12:18:46 -07:00
oobabooga 773c17faec Fix a warning 2023-10-10 20:53:38 -07:00
oobabooga f63361568c Fix safetensors kwarg usage in AutoAWQ 2023-10-10 19:03:09 -07:00
oobabooga 39f16ff83d Fix default/notebook tabs css 2023-10-10 18:45:12 -07:00
oobabooga fae8062d39
Bump to latest gradio (3.47) (#4258) 2023-10-10 22:20:49 -03:00
Haotian Liu 2b75d725e6
Initial support for LLaVA-LLaMA-2. (#3377) 2023-10-10 18:40:52 -03:00
oobabooga 9fab9a1ca6 Minor fix 2023-10-10 14:08:11 -07:00
oobabooga a49cc69a4a Ignore rope_freq_base if value is 10000 2023-10-10 13:57:40 -07:00
oobabooga 3a9d90c3a1 Download models with 4 threads by default 2023-10-10 13:52:10 -07:00
dependabot[bot] 520cbb2ab1
Bump safetensors from 0.3.2 to 0.4.0 (#4249) 2023-10-10 17:41:09 -03:00
Forkoz 35695e18c7
Remove import. (#4247)
For real this time.
2023-10-09 18:06:11 -03:00
501 changed files with 28296 additions and 14527 deletions

1
.github/FUNDING.yml vendored
View file

@ -1 +0,0 @@
ko_fi: oobabooga

View file

@ -46,7 +46,7 @@ body:
id: system-info id: system-info
attributes: attributes:
label: System Info label: System Info
description: "Please share your system info with us: operating system, GPU brand, and GPU model. If you are using a Google Colab notebook, mention that instead." description: "Please share your operating system and GPU type (NVIDIA/AMD/Intel/Apple). If you are using a Google Colab notebook, mention that instead."
render: shell render: shell
placeholder: placeholder:
validations: validations:

View file

@ -5,7 +5,10 @@
version: 2 version: 2
updates: updates:
- package-ecosystem: "pip" # See documentation for possible values - package-ecosystem: "pip"
directory: "/" # Location of package manifests directories:
- "/requirements/full/"
- "/requirements/portable/"
target-branch: "dev"
schedule: schedule:
interval: "weekly" interval: "weekly"

View file

@ -0,0 +1,70 @@
name: Build Everything TGW
on:
workflow_dispatch:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
permissions:
contents: write
jobs:
build_release_cuda_windows:
name: CUDA Windows
uses: ./.github/workflows/build-portable-release-cuda.yml
with:
version: ${{ inputs.version }}
config: 'os:windows-2022'
build_release_cuda_linux:
name: CUDA Linux
uses: ./.github/workflows/build-portable-release-cuda.yml
with:
version: ${{ inputs.version }}
config: 'os:ubuntu-22.04'
build_release_vulkan_windows:
name: Vulkan Windows
uses: ./.github/workflows/build-portable-release-vulkan.yml
with:
version: ${{ inputs.version }}
config: 'os:windows-2022'
build_release_vulkan_linux:
name: Vulkan Linux
uses: ./.github/workflows/build-portable-release-vulkan.yml
with:
version: ${{ inputs.version }}
config: 'os:ubuntu-22.04'
build_release_rocm_linux:
name: ROCm Linux
uses: ./.github/workflows/build-portable-release-rocm.yml
with:
version: ${{ inputs.version }}
config: 'os:ubuntu-22.04'
build_release_cpu_windows:
name: CPU Windows
uses: ./.github/workflows/build-portable-release.yml
with:
version: ${{ inputs.version }}
config: 'os:windows-2022'
build_release_cpu_linux:
name: CPU Linux
uses: ./.github/workflows/build-portable-release.yml
with:
version: ${{ inputs.version }}
config: 'os:ubuntu-22.04'
build_release_macos:
name: macOS
uses: ./.github/workflows/build-portable-release.yml
with:
version: ${{ inputs.version }}
config: 'os:macos-15-intel,macos-14'

View file

@ -0,0 +1,175 @@
name: Build CUDA
on:
workflow_dispatch:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
config:
description: 'Override configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
default: 'Default'
required: false
type: string
exclude:
description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
default: 'None'
required: false
type: string
workflow_call:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
config:
description: 'Configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
default: 'Default'
required: false
type: string
exclude:
description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
default: 'None'
required: false
type: string
permissions:
contents: write
jobs:
define_matrix:
name: Define Build Matrix
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
defaults:
run:
shell: pwsh
env:
CONFIGIN: ${{ inputs.config }}
EXCLUDEIN: ${{ inputs.exclude }}
steps:
- name: Define Job Output
id: set-matrix
run: |
$matrix = @{
'os' = @('ubuntu-22.04', 'windows-2022')
'pyver' = @("3.13")
'cuda' = @("12.4", "13.1")
}
if ($env:CONFIGIN -ne 'Default') {$env:CONFIGIN.split(';').foreach({$matrix[$_.split(':')[0]] = $_.split(':')[1].split(',')})}
if ($env:EXCLUDEIN -ne 'None') {
$exclusions = @()
$exclusions += $env:EXCLUDEIN.split(';').replace(':','=').replace(',',"`n") | ConvertFrom-StringData
$matrix['exclude'] = $exclusions
}
$matrixOut = ConvertTo-Json $matrix -Compress
Write-Output ('matrix=' + $matrixOut) >> $env:GITHUB_OUTPUT
build_wheels:
name: ${{ matrix.os }} ${{ matrix.pyver }} CUDA ${{ matrix.cuda }}
needs: define_matrix
runs-on: ${{ matrix.os }}
strategy:
matrix: ${{ fromJSON(needs.define_matrix.outputs.matrix) }}
defaults:
run:
shell: pwsh
env:
PCKGVER: ${{ inputs.version }}
steps:
- uses: actions/checkout@v6
with:
repository: 'oobabooga/text-generation-webui'
ref: ${{ inputs.version }}
submodules: 'recursive'
- uses: actions/setup-python@v6
with:
python-version: ${{ matrix.pyver }}
- name: Build Package
shell: bash
run: |
VERSION_CLEAN="${{ inputs.version }}"
VERSION_CLEAN="${VERSION_CLEAN#v}"
cd ..
cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}"
cd "text-generation-webui-${VERSION_CLEAN}"
# Remove extensions that need additional requirements
allowed=("character_bias" "gallery" "openai" "sd_api_pictures")
find extensions/ -mindepth 1 -maxdepth 1 -type d | grep -v -E "$(printf '%s|' "${allowed[@]}" | sed 's/|$//')" | xargs rm -rf
# Define common variables
CUDA_VERSION="${{ matrix.cuda }}"
VERSION="${{ inputs.version }}"
# 1. Set platform-specific variables
if [[ "$RUNNER_OS" == "Windows" ]]; then
PLATFORM="windows"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-pc-windows-msvc-install_only.tar.gz"
PIP_PATH="portable_env/python.exe -m pip"
PACKAGES_PATH="portable_env/Lib/site-packages"
rm start_linux.sh start_macos.sh
else
PLATFORM="linux"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz"
PIP_PATH="portable_env/bin/python -m pip"
PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
rm start_macos.sh start_windows.bat
fi
# 2. Download and extract Python
cd ..
echo "Downloading Python for $PLATFORM..."
curl -L -o python-build.tar.gz "$PYTHON_URL"
tar -xzf python-build.tar.gz
mv python "text-generation-webui-${VERSION_CLEAN}/portable_env"
# 3. Prepare requirements file based on CUDA version
cd "text-generation-webui-${VERSION_CLEAN}"
if [[ "$CUDA_VERSION" == "13.1" ]]; then
REQ_FILE="requirements/portable/requirements_cuda131.txt"
else
REQ_FILE="requirements/portable/requirements.txt"
fi
# 4. Install packages
echo "Installing Python packages from $REQ_FILE..."
$PIP_PATH install --target="./$PACKAGES_PATH" -r "$REQ_FILE"
# 5. Clean up
rm -rf .git cmd* update_wizard* Colab-TextGen-GPU.ipynb docker setup.cfg .github .gitignore requirements/ one_click.py
# 6. Create archive
cd ..
if [[ "$RUNNER_OS" == "Windows" ]]; then
ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-cuda${CUDA_VERSION}.zip"
echo "Creating archive: $ARCHIVE_NAME"
powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
else
ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-cuda${CUDA_VERSION}.tar.gz"
echo "Creating archive: $ARCHIVE_NAME"
tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}"
fi
- name: Upload files to a GitHub release
id: upload-release
uses: svenstaro/upload-release-action@2.7.0
continue-on-error: true
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: ../textgen-portable-*
tag: ${{ inputs.version }}
file_glob: true
make_latest: false
overwrite: true

View file

@ -0,0 +1,170 @@
name: Build ROCm
on:
workflow_dispatch:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
config:
description: 'Override configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
default: 'Default'
required: false
type: string
exclude:
description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
default: 'None'
required: false
type: string
workflow_call:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
config:
description: 'Configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
default: 'Default'
required: false
type: string
exclude:
description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
default: 'None'
required: false
type: string
permissions:
contents: write
jobs:
define_matrix:
name: Define Build Matrix
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
defaults:
run:
shell: pwsh
env:
CONFIGIN: ${{ inputs.config }}
EXCLUDEIN: ${{ inputs.exclude }}
steps:
- name: Define Job Output
id: set-matrix
run: |
$matrix = @{
'os' = @('ubuntu-22.04', 'windows-2022')
'pyver' = @("3.13")
}
if ($env:CONFIGIN -ne 'Default') {$env:CONFIGIN.split(';').foreach({$matrix[$_.split(':')[0]] = $_.split(':')[1].split(',')})}
if ($env:EXCLUDEIN -ne 'None') {
$exclusions = @()
$exclusions += $env:EXCLUDEIN.split(';').replace(':','=').replace(',',"`n") | ConvertFrom-StringData
$matrix['exclude'] = $exclusions
}
$matrixOut = ConvertTo-Json $matrix -Compress
Write-Output ('matrix=' + $matrixOut) >> $env:GITHUB_OUTPUT
build_wheels:
name: ${{ matrix.os }} ${{ matrix.pyver }}
needs: define_matrix
runs-on: ${{ matrix.os }}
strategy:
matrix: ${{ fromJSON(needs.define_matrix.outputs.matrix) }}
defaults:
run:
shell: pwsh
env:
PCKGVER: ${{ inputs.version }}
steps:
- uses: actions/checkout@v6
with:
repository: 'oobabooga/text-generation-webui'
ref: ${{ inputs.version }}
submodules: 'recursive'
- uses: actions/setup-python@v6
with:
python-version: ${{ matrix.pyver }}
- name: Build Package
shell: bash
run: |
VERSION_CLEAN="${{ inputs.version }}"
VERSION_CLEAN="${VERSION_CLEAN#v}"
cd ..
cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}"
cd "text-generation-webui-${VERSION_CLEAN}"
# Remove extensions that need additional requirements
allowed=("character_bias" "gallery" "openai" "sd_api_pictures")
find extensions/ -mindepth 1 -maxdepth 1 -type d | grep -v -E "$(printf '%s|' "${allowed[@]}" | sed 's/|$//')" | xargs rm -rf
# Define common variables
VERSION="${{ inputs.version }}"
# 1. Set platform-specific variables
if [[ "$RUNNER_OS" == "Windows" ]]; then
PLATFORM="windows"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-pc-windows-msvc-install_only.tar.gz"
PIP_PATH="portable_env/python.exe -m pip"
PACKAGES_PATH="portable_env/Lib/site-packages"
rm start_linux.sh start_macos.sh
else
PLATFORM="linux"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz"
PIP_PATH="portable_env/bin/python -m pip"
PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
rm start_macos.sh start_windows.bat
fi
# 2. Download and extract Python
cd ..
echo "Downloading Python for $PLATFORM..."
curl -L -o python-build.tar.gz "$PYTHON_URL"
tar -xzf python-build.tar.gz
mv python "text-generation-webui-${VERSION_CLEAN}/portable_env"
# 3. Prepare requirements file
REQ_FILE="requirements/portable/requirements_amd.txt"
cd "text-generation-webui-${VERSION_CLEAN}"
# 4. Install packages
echo "Installing Python packages from $REQ_FILE..."
$PIP_PATH install --target="./$PACKAGES_PATH" -r "$REQ_FILE"
# 5. Clean up
rm -rf .git cmd* update_wizard* Colab-TextGen-GPU.ipynb docker setup.cfg .github .gitignore requirements/ one_click.py
# 6. Create archive
cd ..
if [[ "$RUNNER_OS" == "Windows" ]]; then
ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-rocm7.2.zip"
echo "Creating archive: $ARCHIVE_NAME"
powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
else
ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-rocm7.2.tar.gz"
echo "Creating archive: $ARCHIVE_NAME"
tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}"
fi
- name: Upload files to a GitHub release
id: upload-release
uses: svenstaro/upload-release-action@2.7.0
continue-on-error: true
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: ../textgen-portable-*
tag: ${{ inputs.version }}
file_glob: true
make_latest: false
overwrite: true

View file

@ -0,0 +1,170 @@
name: Build Vulkan
on:
workflow_dispatch:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
config:
description: 'Override configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
default: 'Default'
required: false
type: string
exclude:
description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
default: 'None'
required: false
type: string
workflow_call:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
config:
description: 'Configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
default: 'Default'
required: false
type: string
exclude:
description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
default: 'None'
required: false
type: string
permissions:
contents: write
jobs:
define_matrix:
name: Define Build Matrix
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
defaults:
run:
shell: pwsh
env:
CONFIGIN: ${{ inputs.config }}
EXCLUDEIN: ${{ inputs.exclude }}
steps:
- name: Define Job Output
id: set-matrix
run: |
$matrix = @{
'os' = @('ubuntu-22.04', 'windows-2022')
'pyver' = @("3.13")
}
if ($env:CONFIGIN -ne 'Default') {$env:CONFIGIN.split(';').foreach({$matrix[$_.split(':')[0]] = $_.split(':')[1].split(',')})}
if ($env:EXCLUDEIN -ne 'None') {
$exclusions = @()
$exclusions += $env:EXCLUDEIN.split(';').replace(':','=').replace(',',"`n") | ConvertFrom-StringData
$matrix['exclude'] = $exclusions
}
$matrixOut = ConvertTo-Json $matrix -Compress
Write-Output ('matrix=' + $matrixOut) >> $env:GITHUB_OUTPUT
build_wheels:
name: ${{ matrix.os }} ${{ matrix.pyver }}
needs: define_matrix
runs-on: ${{ matrix.os }}
strategy:
matrix: ${{ fromJSON(needs.define_matrix.outputs.matrix) }}
defaults:
run:
shell: pwsh
env:
PCKGVER: ${{ inputs.version }}
steps:
- uses: actions/checkout@v6
with:
repository: 'oobabooga/text-generation-webui'
ref: ${{ inputs.version }}
submodules: 'recursive'
- uses: actions/setup-python@v6
with:
python-version: ${{ matrix.pyver }}
- name: Build Package
shell: bash
run: |
VERSION_CLEAN="${{ inputs.version }}"
VERSION_CLEAN="${VERSION_CLEAN#v}"
cd ..
cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}"
cd "text-generation-webui-${VERSION_CLEAN}"
# Remove extensions that need additional requirements
allowed=("character_bias" "gallery" "openai" "sd_api_pictures")
find extensions/ -mindepth 1 -maxdepth 1 -type d | grep -v -E "$(printf '%s|' "${allowed[@]}" | sed 's/|$//')" | xargs rm -rf
# Define common variables
VERSION="${{ inputs.version }}"
# 1. Set platform-specific variables
if [[ "$RUNNER_OS" == "Windows" ]]; then
PLATFORM="windows"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-pc-windows-msvc-install_only.tar.gz"
PIP_PATH="portable_env/python.exe -m pip"
PACKAGES_PATH="portable_env/Lib/site-packages"
rm start_linux.sh start_macos.sh
else
PLATFORM="linux"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz"
PIP_PATH="portable_env/bin/python -m pip"
PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
rm start_macos.sh start_windows.bat
fi
# 2. Download and extract Python
cd ..
echo "Downloading Python for $PLATFORM..."
curl -L -o python-build.tar.gz "$PYTHON_URL"
tar -xzf python-build.tar.gz
mv python "text-generation-webui-${VERSION_CLEAN}/portable_env"
# 3. Prepare requirements file
REQ_FILE="requirements/portable/requirements_vulkan.txt"
cd "text-generation-webui-${VERSION_CLEAN}"
# 4. Install packages
echo "Installing Python packages from $REQ_FILE..."
$PIP_PATH install --target="./$PACKAGES_PATH" -r "$REQ_FILE"
# 5. Clean up
rm -rf .git cmd* update_wizard* Colab-TextGen-GPU.ipynb docker setup.cfg .github .gitignore requirements/ one_click.py
# 6. Create archive
cd ..
if [[ "$RUNNER_OS" == "Windows" ]]; then
ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-vulkan.zip"
echo "Creating archive: $ARCHIVE_NAME"
powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
else
ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-vulkan.tar.gz"
echo "Creating archive: $ARCHIVE_NAME"
tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}"
fi
- name: Upload files to a GitHub release
id: upload-release
uses: svenstaro/upload-release-action@2.7.0
continue-on-error: true
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: ../textgen-portable-*
tag: ${{ inputs.version }}
file_glob: true
make_latest: false
overwrite: true

View file

@ -0,0 +1,196 @@
name: Build CPU and macOS
on:
workflow_dispatch:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
config:
description: 'Override configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
default: 'Default'
required: false
type: string
exclude:
description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
default: 'None'
required: false
type: string
workflow_call:
inputs:
version:
description: 'Version tag of text-generation-webui to build: v3.0'
default: 'v3.0'
required: true
type: string
config:
description: 'Configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
default: 'Default'
required: false
type: string
exclude:
description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
default: 'None'
required: false
type: string
permissions:
contents: write
jobs:
define_matrix:
name: Define Build Matrix
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
defaults:
run:
shell: pwsh
env:
CONFIGIN: ${{ inputs.config }}
EXCLUDEIN: ${{ inputs.exclude }}
steps:
- name: Define Job Output
id: set-matrix
run: |
$matrix = @{
'os' = @('ubuntu-22.04', 'windows-2022', 'macos-14')
'pyver' = @("3.13")
}
if ($env:CONFIGIN -ne 'Default') {$env:CONFIGIN.split(';').foreach({$matrix[$_.split(':')[0]] = $_.split(':')[1].split(',')})}
if ($env:EXCLUDEIN -ne 'None') {
$exclusions = @()
$exclusions += $env:EXCLUDEIN.split(';').replace(':','=').replace(',',"`n") | ConvertFrom-StringData
$matrix['exclude'] = $exclusions
}
$matrixOut = ConvertTo-Json $matrix -Compress
Write-Output ('matrix=' + $matrixOut) >> $env:GITHUB_OUTPUT
build_wheels:
name: ${{ matrix.os }} ${{ matrix.pyver }}
needs: define_matrix
runs-on: ${{ matrix.os }}
strategy:
matrix: ${{ fromJSON(needs.define_matrix.outputs.matrix) }}
defaults:
run:
shell: pwsh
env:
PCKGVER: ${{ inputs.version }}
steps:
- uses: actions/checkout@v6
with:
repository: 'oobabooga/text-generation-webui'
ref: ${{ inputs.version }}
submodules: 'recursive'
- uses: actions/setup-python@v6
with:
python-version: ${{ matrix.pyver }}
- name: Build Package
shell: bash
run: |
VERSION_CLEAN="${{ inputs.version }}"
VERSION_CLEAN="${VERSION_CLEAN#v}"
cd ..
cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}"
cd "text-generation-webui-${VERSION_CLEAN}"
# Remove extensions that need additional requirements
allowed=("character_bias" "gallery" "openai" "sd_api_pictures")
find extensions/ -mindepth 1 -maxdepth 1 -type d | grep -v -E "$(printf '%s|' "${allowed[@]}" | sed 's/|$//')" | xargs rm -rf
# Define common variables
VERSION="${{ inputs.version }}"
OS_TYPE="${{ matrix.os }}"
# 1. Set platform-specific variables
if [[ "$RUNNER_OS" == "Windows" ]]; then
PLATFORM="windows-cpu"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-pc-windows-msvc-install_only.tar.gz"
PIP_PATH="portable_env/python.exe -m pip"
PACKAGES_PATH="portable_env/Lib/site-packages"
rm start_linux.sh start_macos.sh
elif [[ "$RUNNER_OS" == "macOS" ]]; then
if [[ "$OS_TYPE" == "macos-15-intel" ]]; then
PLATFORM="macos-x86_64"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-apple-darwin-install_only.tar.gz"
REQ_TYPE="apple_intel"
else
PLATFORM="macos-arm64"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-aarch64-apple-darwin-install_only.tar.gz"
REQ_TYPE="apple_silicon"
fi
PIP_PATH="portable_env/bin/python -m pip"
PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
rm start_linux.sh start_windows.bat
else
# Linux case
PLATFORM="linux-cpu"
PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz"
PIP_PATH="portable_env/bin/python -m pip"
PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
rm start_macos.sh start_windows.bat
fi
# 2. Download and extract Python
echo "Downloading Python for $PLATFORM..."
cd ..
curl -L -o python-build.tar.gz "$PYTHON_URL"
tar -xzf python-build.tar.gz
mv python "text-generation-webui-${VERSION_CLEAN}/portable_env"
# 3. Prepare requirements file based on platform
cd "text-generation-webui-${VERSION_CLEAN}"
# Select requirements file based on platform
if [[ "$RUNNER_OS" == "macOS" ]]; then
if [[ "$OS_TYPE" == "macos-15-intel" ]]; then
REQ_FILE="requirements/portable/requirements_apple_intel.txt"
else
REQ_FILE="requirements/portable/requirements_apple_silicon.txt"
fi
else
REQ_FILE="requirements/portable/requirements_cpu_only.txt"
fi
echo "Using requirements file: $REQ_FILE"
# 4. Install packages
echo "Installing Python packages from $REQ_FILE..."
$PIP_PATH install --target="./$PACKAGES_PATH" -r "$REQ_FILE"
# 5. Clean up
rm -rf .git cmd* update_wizard* Colab-TextGen-GPU.ipynb docker setup.cfg .github .gitignore requirements/ one_click.py
# 6. Create archive
cd ..
if [[ "$RUNNER_OS" == "Windows" ]]; then
ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}.zip"
echo "Creating archive: $ARCHIVE_NAME"
powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
else
ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}.tar.gz"
echo "Creating archive: $ARCHIVE_NAME"
tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}"
fi
- name: Upload files to a GitHub release
id: upload-release
uses: svenstaro/upload-release-action@2.7.0
continue-on-error: true
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: ../textgen-portable-*
tag: ${{ inputs.version }}
file_glob: true
make_latest: false
overwrite: true

View file

@ -1,22 +0,0 @@
name: Close inactive issues
on:
schedule:
- cron: "10 23 * * *"
jobs:
close-issues:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v5
with:
stale-issue-message: ""
close-issue-message: "This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment."
days-before-issue-stale: 42
days-before-issue-close: 0
stale-issue-label: "stale"
days-before-pr-stale: -1
days-before-pr-close: -1
repo-token: ${{ secrets.GITHUB_TOKEN }}

61
.gitignore vendored
View file

@ -1,38 +1,33 @@
cache /css
characters /extensions
training/datasets /installer_files
extensions/silero_tts/outputs /repositories
extensions/elevenlabs_tts/outputs /user_data
extensions/sd_api_pictures/outputs
extensions/multimodal/pipelines .chroma
logs .DS_Store
loras .eslintrc.js
models .idea
presets .installer_state.json
repositories .venv
softprompts venv
torch-dumps .envrc
*pycache* .direnv
*/*pycache* .vs
*/*/pycache*
venv/
.venv/
.vscode .vscode
.idea/
*.bak *.bak
*.ipynb *.ipynb
*.log *.log
*pycache*
settings.json cert.pem
settings.yaml key.pem
notification.mp3 package.json
img_bot* package-lock.json
img_me*
prompts/[0-9]*
models/config-user.yaml
.DS_Store
Thumbs.db Thumbs.db
.chroma wandb
installer_files
/CMD_FLAGS.txt # ignore user docker config and top level links to docker files
/docker-compose.yaml
/docker-compose.yml
/Dockerfile
.env

View file

@ -1,3 +0,0 @@
# Only used by the one-click installer.
# Example:
# --listen --api

119
Colab-TextGen-GPU.ipynb Normal file
View file

@ -0,0 +1,119 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"private_outputs": true,
"provenance": [],
"gpuType": "T4"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# oobabooga/text-generation-webui\n",
"\n",
"After running both cells, a public gradio URL will appear at the bottom in around 10 minutes. You can optionally generate an API link.\n",
"\n",
"* Project page: https://github.com/oobabooga/text-generation-webui\n",
"* Gradio server status: https://status.gradio.app/"
],
"metadata": {
"id": "MFQl6-FjSYtY"
}
},
{
"cell_type": "code",
"source": [
"#@title 1. Keep this tab alive to prevent Colab from disconnecting you { display-mode: \"form\" }\n",
"\n",
"#@markdown Press play on the music player that will appear below:\n",
"%%html\n",
"<audio src=\"https://oobabooga.github.io/silence.m4a\" controls>"
],
"metadata": {
"id": "f7TVVj_z4flw"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#@title 2. Launch the web UI\n",
"\n",
"#@markdown You can provide a direct GGUF link or a Hugging Face model URL.\n",
"\n",
"import os\n",
"from pathlib import Path\n",
"\n",
"os.environ.pop('PYTHONPATH', None)\n",
"os.environ.pop('MPLBACKEND', None)\n",
"\n",
"if Path.cwd().name != 'text-generation-webui':\n",
" print(\"\\033[1;32;1m\\n --> Installing the web UI. This will take a while, but after the initial setup, you can download and test as many models as you like.\\033[0;37;0m\\n\")\n",
"\n",
" !git clone https://github.com/oobabooga/text-generation-webui\n",
" %cd text-generation-webui\n",
"\n",
" # Install the project in an isolated environment\n",
" !GPU_CHOICE=A \\\n",
" LAUNCH_AFTER_INSTALL=FALSE \\\n",
" INSTALL_EXTENSIONS=FALSE \\\n",
" ./start_linux.sh\n",
"\n",
"# Parameters\n",
"model_url = \"https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf\" #@param {type:\"string\"}\n",
"branch = \"\" #@param {type:\"string\"}\n",
"command_line_flags = \"--load-in-4bit --use_double_quant\" #@param {type:\"string\"}\n",
"api = False #@param {type:\"boolean\"}\n",
"\n",
"if api:\n",
" for param in ['--api', '--public-api']:\n",
" if param not in command_line_flags:\n",
" command_line_flags += f\" {param}\"\n",
"\n",
"model_url = model_url.strip()\n",
"model_name = \"\"\n",
"if model_url != \"\":\n",
" if not model_url.startswith('http'):\n",
" model_url = 'https://huggingface.co/' + model_url\n",
"\n",
" branch = branch.strip()\n",
" if '/resolve/' in model_url:\n",
" model_name = model_url.split('?')[0].split('/')[-1]\n",
" !python download-model.py {model_url}\n",
" else:\n",
" url_parts = model_url.strip('/').split('/')\n",
" model_name = f\"{url_parts[-2]}_{url_parts[-1]}\"\n",
" if branch not in ['', 'main']:\n",
" model_name += f\"_{branch}\"\n",
" !python download-model.py {model_url} --branch {branch}\n",
" else:\n",
" !python download-model.py {model_url}\n",
"\n",
"# Start the web UI\n",
"cmd = f\"./start_linux.sh {command_line_flags} --share\"\n",
"if model_name != \"\":\n",
" cmd += f\" --model {model_name}\"\n",
"\n",
"!$cmd"
],
"metadata": {
"id": "LGQ8BiMuXMDG",
"cellView": "form"
},
"execution_count": null,
"outputs": []
}
]
}

678
README.md
View file

@ -1,89 +1,142 @@
**Breaking change: WebUI now uses PyTorch 2.1.** <div align="center" markdown="1">
<sup>Special thanks to:</sup>
<br>
<br>
<a href="https://go.warp.dev/text-generation-webui">
<img alt="Warp sponsorship" width="400" src="https://raw.githubusercontent.com/warpdotdev/brand-assets/refs/heads/main/Github/Sponsor/Warp-Github-LG-02.png">
</a>
* For one-click installer users: If you encounter problems after updating, rerun the update script. If issues persist, delete the `installer_files` folder and use the start script to reinstall requirements. ### [Warp, built for coding with multiple AI agents](https://go.warp.dev/text-generation-webui)
* For manual installations, update PyTorch with the [provided command](https://github.com/oobabooga/text-generation-webui/#2-install-pytorch). [Available for macOS, Linux, & Windows](https://go.warp.dev/text-generation-webui)<br>
</div>
<hr>
# Text generation web UI # Text Generation Web UI
A Gradio web UI for Large Language Models. A Gradio web UI for running Large Language Models locally. 100% private and offline. Supports text generation, vision, tool-calling, training, image generation, and more.
Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) of text generation. [Try the Deep Reason extension](https://oobabooga.gumroad.com/l/deep_reason)
|![Image1](https://github.com/oobabooga/screenshots/raw/main/print_instruct.png) | ![Image2](https://github.com/oobabooga/screenshots/raw/main/print_chat.png) | |![Image1](https://github.com/oobabooga/screenshots/raw/main/INSTRUCT-3.5.png) | ![Image2](https://github.com/oobabooga/screenshots/raw/main/CHAT-3.5.png) |
|:---:|:---:| |:---:|:---:|
|![Image1](https://github.com/oobabooga/screenshots/raw/main/print_default.png) | ![Image2](https://github.com/oobabooga/screenshots/raw/main/print_parameters.png) | |![Image1](https://github.com/oobabooga/screenshots/raw/main/DEFAULT-3.5.png) | ![Image2](https://github.com/oobabooga/screenshots/raw/main/PARAMETERS-3.5.png) |
## Features ## Features
* 3 interface modes: default (two columns), notebook, and chat - **Multiple backends**: [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Switch between backends and models without restarting.
* Multiple model backends: [transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [ExLlama](https://github.com/turboderp/exllama), [ExLlamaV2](https://github.com/turboderp/exllamav2), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [CTransformers](https://github.com/marella/ctransformers), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - **File attachments**: Upload text files, PDF documents, and .docx documents to talk about their contents.
* Dropdown menu for quickly switching between different models - **Vision (multimodal)**: Attach images to messages for visual understanding ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Multimodal-Tutorial)).
* LoRA: load and unload LoRAs on the fly, train a new LoRA using QLoRA - **Tool-calling**: Models can call custom functions during chat — web search, page fetching, math, and more. Each tool is a single `.py` file, easy to create and extend ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Tool-Calling-Tutorial)).
* Precise instruction templates for chat mode, including Llama-2-chat, Alpaca, Vicuna, WizardLM, StableLM, and many others - **OpenAI-compatible API**: Chat and Completions endpoints with tool-calling support. Use as a local drop-in replacement for the OpenAI API ([examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples)).
* 4-bit, 8-bit, and CPU inference through the transformers library - **Training**: Fine-tune LoRAs on multi-turn chat or raw text datasets. Supports resuming interrupted runs ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/05-%E2%80%90-Training-Tab)).
* Use llama.cpp models with transformers samplers (`llamacpp_HF` loader) - **Image generation**: A dedicated tab for `diffusers` models like **Z-Image-Turbo**. Features 4-bit/8-bit quantization and a persistent gallery with metadata ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Image-Generation-Tutorial)).
* [Multimodal pipelines, including LLaVA and MiniGPT-4](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/multimodal) - **Easy setup**: [Portable builds](https://github.com/oobabooga/text-generation-webui/releases) (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or a one-click installer for the full feature set.
* [Extensions framework](docs/Extensions.md) - 100% offline and private, with zero telemetry, external resources, or remote update requests.
* [Custom chat characters](docs/Chat-mode.md) - `instruct` mode for instruction-following (like ChatGPT), and `chat-instruct`/`chat` modes for talking to custom characters. Prompts are automatically formatted with Jinja2 templates.
* Very efficient text streaming - Edit messages, navigate between message versions, and branch conversations at any point.
* Markdown output with LaTeX rendering, to use for instance with [GALACTICA](https://github.com/paperswithcode/galai) - Free-form text generation in the Notebook tab without being limited to chat turns.
* API, including endpoints for websocket streaming ([see the examples](https://github.com/oobabooga/text-generation-webui/blob/main/api-examples)) - Multiple sampling parameters and generation options for sophisticated text generation control.
- Aesthetic UI with dark and light themes.
- Syntax highlighting for code blocks and LaTeX rendering for mathematical expressions.
- Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.
To learn how to use the various features, check out the Documentation: https://github.com/oobabooga/text-generation-webui/tree/main/docs ## How to install
## Installation #### ✅ Option 1: Portable builds (get started in 1 minute)
### One-click installers No installation needed just download, unzip and run. All dependencies included.
1) Clone or download the repository. Download from here: **https://github.com/oobabooga/text-generation-webui/releases**
2) Run the `start_linux.sh`, `start_windows.bat`, `start_macos.sh`, or `start_wsl.bat` script depending on your OS.
3) Select your GPU vendor when asked.
4) Have fun!
#### How it works - Builds are provided for Linux, Windows, and macOS, with options for CUDA, Vulkan, ROCm, and CPU-only.
- Compatible with GGUF (llama.cpp) models.
The script creates a folder called `installer_files` where it sets up a Conda environment using Miniconda. The installation is self-contained: if you want to reinstall, just delete `installer_files` and run the start script again. #### Option 2: Manual portable install with venv
To launch the webui in the future after it is already installed, run the same `start` script. Very fast setup that should work on any Python 3.9+:
#### Getting updates ```bash
# Clone repository
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
Run `update_linux.sh`, `update_windows.bat`, `update_macos.sh`, or `update_wsl.bat`. # Create virtual environment
python -m venv venv
#### Running commands # Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
If you ever need to install something manually in the `installer_files` environment, you can launch an interactive shell using the cmd script: `cmd_linux.sh`, `cmd_windows.bat`, `cmd_macos.sh`, or `cmd_wsl.bat`. # Install dependencies (choose appropriate file under requirements/portable for your hardware)
pip install -r requirements/portable/requirements.txt --upgrade
#### Defining command-line flags # Launch server (basic command)
python server.py --portable --api --auto-launch
To define persistent command-line flags like `--listen` or `--api`, edit the `CMD_FLAGS.txt` file with a text editor and add them there. Flags can also be provided directly to the start scripts, for instance, `./start-linux.sh --listen`. # When done working, deactivate
deactivate
```
#### Other info #### Option 3: One-click installer
* There is no need to run any of those scripts as admin/root. For users who need additional backends (ExLlamaV3, Transformers), training, image generation, or extensions (TTS, voice input, translation, etc). Requires ~10GB disk space and downloads PyTorch.
* For additional instructions about AMD setup, WSL setup, and nvcc installation, consult [this page](https://github.com/oobabooga/text-generation-webui/blob/main/docs/One-Click-Installers.md).
* The installer has been tested mostly on NVIDIA GPUs. If you can find a way to improve it for your AMD/Intel Arc/Mac Metal GPU, you are highly encouraged to submit a PR to this repository. The main file to be edited is `one_click.py`.
* For automated installation, you can use the `GPU_CHOICE`, `LAUNCH_AFTER_INSTALL`, and `INSTALL_EXTENSIONS` environment variables. For instance: `GPU_CHOICE=A LAUNCH_AFTER_INSTALL=False INSTALL_EXTENSIONS=False ./start_linux.sh`.
### Manual installation using Conda 1. Clone the repository, or [download its source code](https://github.com/oobabooga/text-generation-webui/archive/refs/heads/main.zip) and extract it.
2. Run the startup script for your OS: `start_windows.bat`, `start_linux.sh`, or `start_macos.sh`.
3. When prompted, select your GPU vendor.
4. After installation, open `http://127.0.0.1:7860` in your browser.
Recommended if you have some experience with the command-line. To restart the web UI later, run the same `start_` script.
You can pass command-line flags directly (e.g., `./start_linux.sh --help`), or add them to `user_data/CMD_FLAGS.txt` (e.g., `--api` to enable the API).
To update, run the update script for your OS: `update_wizard_windows.bat`, `update_wizard_linux.sh`, or `update_wizard_macos.sh`.
To reinstall with a fresh Python environment, delete the `installer_files` folder and run the `start_` script again.
<details>
<summary>
One-click installer details
</summary>
### One-click-installer
The script uses Miniforge to set up a Conda environment in the `installer_files` folder.
If you ever need to install something manually in the `installer_files` environment, you can launch an interactive shell using the cmd script: `cmd_linux.sh`, `cmd_windows.bat`, or `cmd_macos.sh`.
* There is no need to run any of those scripts (`start_`, `update_wizard_`, or `cmd_`) as admin/root.
* To install requirements for extensions, it is recommended to use the update wizard script with the "Install/update extensions requirements" option. At the end, this script will install the main requirements for the project to make sure that they take precedence in case of version conflicts.
* For automated installation, you can use the `GPU_CHOICE`, `LAUNCH_AFTER_INSTALL`, and `INSTALL_EXTENSIONS` environment variables. For instance: `GPU_CHOICE=A LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh`.
</details>
<details>
<summary>
Manual full installation with conda or docker
</summary>
### Full installation with Conda
#### 0. Install Conda #### 0. Install Conda
https://docs.conda.io/en/latest/miniconda.html https://github.com/conda-forge/miniforge
On Linux or WSL, it can be automatically installed with these two commands ([source](https://educe-ubc.github.io/conda.html)): On Linux or WSL, Miniforge can be automatically installed with these two commands:
``` ```
curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh" curl -sL "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" > "Miniforge3.sh"
bash Miniconda3.sh bash Miniforge3.sh
``` ```
For other platforms, download from: https://github.com/conda-forge/miniforge/releases/latest
#### 1. Create a new conda environment #### 1. Create a new conda environment
``` ```
conda create -n textgen python=3.10 conda create -n textgen python=3.13
conda activate textgen conda activate textgen
``` ```
@ -91,330 +144,323 @@ conda activate textgen
| System | GPU | Command | | System | GPU | Command |
|--------|---------|---------| |--------|---------|---------|
| Linux/WSL | NVIDIA | `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118` | | Linux/WSL | NVIDIA | `pip3 install torch==2.9.1 --index-url https://download.pytorch.org/whl/cu128` |
| Linux/WSL | CPU only | `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu` | | Linux/WSL | CPU only | `pip3 install torch==2.9.1 --index-url https://download.pytorch.org/whl/cpu` |
| Linux | AMD | `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6` | | Linux | AMD | `pip3 install https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torch-2.9.1%2Brocm7.2.0.lw.git7e1940d4-cp313-cp313-linux_x86_64.whl` |
| MacOS + MPS | Any | `pip3 install torch torchvision torchaudio` | | MacOS + MPS | Any | `pip3 install torch==2.9.1` |
| Windows | NVIDIA | `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118` | | Windows | NVIDIA | `pip3 install torch==2.9.1 --index-url https://download.pytorch.org/whl/cu128` |
| Windows | CPU only | `pip3 install torch torchvision torchaudio` | | Windows | CPU only | `pip3 install torch==2.9.1` |
The up-to-date commands can be found here: https://pytorch.org/get-started/locally/. The up-to-date commands can be found here: https://pytorch.org/get-started/locally/.
If you need `nvcc` to compile some library manually, you will additionally need to install this:
```
conda install -y -c "nvidia/label/cuda-12.8.1" cuda
```
#### 3. Install the web UI #### 3. Install the web UI
``` ```
git clone https://github.com/oobabooga/text-generation-webui git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui cd text-generation-webui
pip install -r requirements.txt pip install -r requirements/full/<requirements file according to table below>
``` ```
#### AMD, Metal, Intel Arc, and CPUs without AVX2 Requirements file to use:
1) Replace the last command above with | GPU | requirements file to use |
|--------|---------|
| NVIDIA | `requirements.txt` |
| AMD | `requirements_amd.txt` |
| CPU only | `requirements_cpu_only.txt` |
| Apple Intel | `requirements_apple_intel.txt` |
| Apple Silicon | `requirements_apple_silicon.txt` |
``` ### Start the web UI
pip install -r requirements_nowheels.txt
```
2) Manually install llama-cpp-python using the appropriate command for your hardware: [Installation from PyPI](https://github.com/abetlen/llama-cpp-python#installation-from-pypi).
3) Do the same for CTransformers: [Installation](https://github.com/marella/ctransformers#installation).
4) AMD: Manually install AutoGPTQ: [Installation](https://github.com/PanQiWei/AutoGPTQ#installation).
5) AMD: Manually install [ExLlama](https://github.com/turboderp/exllama) by simply cloning it into the `repositories` folder (it will be automatically compiled at runtime after that):
```
cd text-generation-webui
git clone https://github.com/turboderp/exllama repositories/exllama
```
#### bitsandbytes on older NVIDIA GPUs
bitsandbytes >= 0.39 may not work. In that case, to use `--load-in-8bit`, you may have to downgrade like this:
* Linux: `pip install bitsandbytes==0.38.1`
* Windows: `pip install https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl`
### Alternative: Docker
```
ln -s docker/{Dockerfile,docker-compose.yml,.dockerignore} .
cp docker/.env.example .env
# Edit .env and set TORCH_CUDA_ARCH_LIST based on your GPU model
docker compose up --build
```
* You need to have docker compose v2.17 or higher installed. See [this guide](https://github.com/oobabooga/text-generation-webui/blob/main/docs/Docker.md) for instructions.
* For additional docker files, check out [this repository](https://github.com/Atinoda/text-generation-webui-docker).
### Updating the requirements
From time to time, the `requirements.txt` changes. To update, use these commands:
``` ```
conda activate textgen conda activate textgen
cd text-generation-webui cd text-generation-webui
pip install -r requirements.txt --upgrade python server.py
``` ```
## Downloading models
Models should be placed in the `text-generation-webui/models` folder. They are usually downloaded from [Hugging Face](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads).
* Transformers or GPTQ models are made of several files and must be placed in a subfolder. Example:
```
text-generation-webui
├── models
│   ├── lmsys_vicuna-33b-v1.3
│   │   ├── config.json
│   │   ├── generation_config.json
│   │   ├── pytorch_model-00001-of-00007.bin
│   │   ├── pytorch_model-00002-of-00007.bin
│   │   ├── pytorch_model-00003-of-00007.bin
│   │   ├── pytorch_model-00004-of-00007.bin
│   │   ├── pytorch_model-00005-of-00007.bin
│   │   ├── pytorch_model-00006-of-00007.bin
│   │   ├── pytorch_model-00007-of-00007.bin
│   │   ├── pytorch_model.bin.index.json
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   └── tokenizer.model
```
* GGUF models are a single file and should be placed directly into `models`. Example:
```
text-generation-webui
├── models
│   ├── llama-2-13b-chat.Q4_K_M.gguf
```
In both cases, you can use the "Model" tab of the UI to download the model from Hugging Face automatically. It is also possible to download via the command-line with `python download-model.py organization/model` (use `--help` to see all the options).
#### GPT-4chan
<details>
<summary>
Instructions
</summary>
[GPT-4chan](https://huggingface.co/ykilcher/gpt-4chan) has been shut down from Hugging Face, so you need to download it elsewhere. You have two options:
* Torrent: [16-bit](https://archive.org/details/gpt4chan_model_float16) / [32-bit](https://archive.org/details/gpt4chan_model)
* Direct download: [16-bit](https://theswissbay.ch/pdf/_notpdf_/gpt4chan_model_float16/) / [32-bit](https://theswissbay.ch/pdf/_notpdf_/gpt4chan_model/)
The 32-bit version is only relevant if you intend to run the model in CPU mode. Otherwise, you should use the 16-bit version.
After downloading the model, follow these steps:
1. Place the files under `models/gpt4chan_model_float16` or `models/gpt4chan_model`.
2. Place GPT-J 6B's config.json file in that same folder: [config.json](https://huggingface.co/EleutherAI/gpt-j-6B/raw/main/config.json).
3. Download GPT-J 6B's tokenizer files (they will be automatically detected when you attempt to load GPT-4chan):
```
python download-model.py EleutherAI/gpt-j-6B --text-only
```
When you load this model in default or notebook modes, the "HTML" tab will show the generated text in 4chan format:
![Image3](https://github.com/oobabooga/screenshots/raw/main/gpt4chan.png)
</details>
## Starting the web UI
conda activate textgen
cd text-generation-webui
python server.py
Then browse to Then browse to
`http://localhost:7860/?__theme=dark` `http://127.0.0.1:7860`
Optionally, you can use the following command-line flags: #### Manual install
#### Basic settings The `requirements*.txt` above contain various wheels precompiled through GitHub Actions. If you wish to compile things manually, or if you need to because no suitable wheels are available for your hardware, you can use `requirements_nowheels.txt` and then install your desired loaders manually.
| Flag | Description | ### Alternative: Docker
|--------------------------------------------|-------------|
| `-h`, `--help` | Show this help message and exit. |
| `--multi-user` | Multi-user mode. Chat histories are not saved or automatically loaded. WARNING: this is highly experimental. |
| `--character CHARACTER` | The name of the character to load in chat mode by default. |
| `--model MODEL` | Name of the model to load by default. |
| `--lora LORA [LORA ...]` | The list of LoRAs to load. If you want to load more than one LoRA, write the names separated by spaces. |
| `--model-dir MODEL_DIR` | Path to directory with all the models. |
| `--lora-dir LORA_DIR` | Path to directory with all the loras. |
| `--model-menu` | Show a model menu in the terminal when the web UI is first launched. |
| `--settings SETTINGS_FILE` | Load the default interface settings from this yaml file. See `settings-template.yaml` for an example. If you create a file called `settings.yaml`, this file will be loaded by default without the need to use the `--settings` flag. |
| `--extensions EXTENSIONS [EXTENSIONS ...]` | The list of extensions to load. If you want to load more than one extension, write the names separated by spaces. |
| `--verbose` | Print the prompts to the terminal. |
| `--chat-buttons` | Show buttons on chat tab instead of hover menu. |
#### Model loader ```
For NVIDIA GPU:
ln -s docker/{nvidia/Dockerfile,nvidia/docker-compose.yml,.dockerignore} .
For AMD GPU:
ln -s docker/{amd/Dockerfile,amd/docker-compose.yml,.dockerignore} .
For Intel GPU:
ln -s docker/{intel/Dockerfile,intel/docker-compose.yml,.dockerignore} .
For CPU only
ln -s docker/{cpu/Dockerfile,cpu/docker-compose.yml,.dockerignore} .
cp docker/.env.example .env
#Create logs/cache dir :
mkdir -p user_data/logs user_data/cache
# Edit .env and set:
# TORCH_CUDA_ARCH_LIST based on your GPU model
# APP_RUNTIME_GID your host user's group id (run `id -g` in a terminal)
# BUILD_EXTENIONS optionally add comma separated list of extensions to build
# Edit user_data/CMD_FLAGS.txt and add in it the options you want to execute (like --listen --cpu)
#
docker compose up --build
```
| Flag | Description | * You need to have Docker Compose v2.17 or higher installed. See [this guide](https://github.com/oobabooga/text-generation-webui/wiki/09-%E2%80%90-Docker) for instructions.
|--------------------------------------------|-------------| * For additional docker files, check out [this repository](https://github.com/Atinoda/text-generation-webui-docker).
| `--loader LOADER` | Choose the model loader manually, otherwise, it will get autodetected. Valid options: transformers, autogptq, gptq-for-llama, exllama, exllama_hf, llamacpp, rwkv, ctransformers |
#### Accelerate/transformers ### Updating the requirements
| Flag | Description | From time to time, the `requirements*.txt` change. To update, use these commands:
|---------------------------------------------|-------------|
| `--cpu` | Use the CPU to generate text. Warning: Training on CPU is extremely slow.|
| `--auto-devices` | Automatically split the model across the available GPU(s) and CPU. |
| `--gpu-memory GPU_MEMORY [GPU_MEMORY ...]` | Maximum GPU memory in GiB to be allocated per GPU. Example: `--gpu-memory 10` for a single GPU, `--gpu-memory 10 5` for two GPUs. You can also set values in MiB like `--gpu-memory 3500MiB`. |
| `--cpu-memory CPU_MEMORY` | Maximum CPU memory in GiB to allocate for offloaded weights. Same as above.|
| `--disk` | If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk. |
| `--disk-cache-dir DISK_CACHE_DIR` | Directory to save the disk cache to. Defaults to `cache/`. |
| `--load-in-8bit` | Load the model with 8-bit precision (using bitsandbytes).|
| `--bf16` | Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU. |
| `--no-cache` | Set `use_cache` to False while generating text. This reduces the VRAM usage a bit with a performance cost. |
| `--xformers` | Use xformer's memory efficient attention. This should increase your tokens/s. |
| `--sdp-attention` | Use torch 2.0's sdp attention. |
| `--trust-remote-code` | Set trust_remote_code=True while loading a model. Necessary for ChatGLM and Falcon. |
| `--use_fast` | Set use_fast=True while loading a tokenizer. |
#### Accelerate 4-bit ```
conda activate textgen
cd text-generation-webui
pip install -r <requirements file that you have used> --upgrade
```
</details>
⚠️ Requires minimum compute of 7.0 on Windows at the moment. <details>
<summary>
List of command-line flags
</summary>
| Flag | Description | ```txt
|---------------------------------------------|-------------| usage: server.py [-h] [--user-data-dir USER_DATA_DIR] [--multi-user] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--settings SETTINGS]
| `--load-in-4bit` | Load the model with 4-bit precision (using bitsandbytes). | [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--idle-timeout IDLE_TIMEOUT] [--image-model IMAGE_MODEL] [--image-model-dir IMAGE_MODEL_DIR] [--image-dtype {bfloat16,float16}]
| `--compute_dtype COMPUTE_DTYPE` | compute dtype for 4-bit. Valid options: bfloat16, float16, float32. | [--image-attn-backend {flash_attention_2,sdpa}] [--image-cpu-offload] [--image-compile] [--image-quant {none,bnb-8bit,bnb-4bit,torchao-int8wo,torchao-fp4,torchao-float8wo}]
| `--quant_type QUANT_TYPE` | quant_type for 4-bit. Valid options: nf4, fp4. | [--loader LOADER] [--ctx-size N] [--cache-type N] [--model-draft MODEL_DRAFT] [--draft-max DRAFT_MAX] [--gpu-layers-draft GPU_LAYERS_DRAFT] [--device-draft DEVICE_DRAFT]
| `--use_double_quant` | use_double_quant for 4-bit. | [--ctx-size-draft CTX_SIZE_DRAFT] [--spec-type {none,ngram-mod,ngram-simple,ngram-map-k,ngram-map-k4v,ngram-cache}] [--spec-ngram-size-n SPEC_NGRAM_SIZE_N]
[--spec-ngram-size-m SPEC_NGRAM_SIZE_M] [--spec-ngram-min-hits SPEC_NGRAM_MIN_HITS] [--gpu-layers N] [--cpu-moe] [--mmproj MMPROJ] [--streaming-llm] [--tensor-split TENSOR_SPLIT]
[--row-split] [--no-mmap] [--mlock] [--no-kv-offload] [--batch-size BATCH_SIZE] [--ubatch-size UBATCH_SIZE] [--threads THREADS] [--threads-batch THREADS_BATCH] [--numa]
[--parallel PARALLEL] [--fit-target FIT_TARGET] [--extra-flags EXTRA_FLAGS] [--cpu] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16]
[--no-cache] [--trust-remote-code] [--force-safetensors] [--no_use_fast] [--attn-implementation IMPLEMENTATION] [--load-in-4bit] [--use_double_quant] [--compute_dtype COMPUTE_DTYPE]
[--quant_type QUANT_TYPE] [--gpu-split GPU_SPLIT] [--enable-tp] [--tp-backend TP_BACKEND] [--cfg-cache] [--listen] [--listen-port LISTEN_PORT] [--listen-host LISTEN_HOST] [--share]
[--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--ssl-keyfile SSL_KEYFILE] [--ssl-certfile SSL_CERTFILE] [--subpath SUBPATH] [--old-colors]
[--portable] [--api] [--public-api] [--public-api-id PUBLIC_API_ID] [--api-port API_PORT] [--api-key API_KEY] [--admin-key ADMIN_KEY] [--api-enable-ipv6] [--api-disable-ipv4]
[--nowebui] [--temperature N] [--dynatemp-low N] [--dynatemp-high N] [--dynatemp-exponent N] [--smoothing-factor N] [--smoothing-curve N] [--min-p N] [--top-p N] [--top-k N]
[--typical-p N] [--xtc-threshold N] [--xtc-probability N] [--epsilon-cutoff N] [--eta-cutoff N] [--tfs N] [--top-a N] [--top-n-sigma N] [--adaptive-target N] [--adaptive-decay N]
[--dry-multiplier N] [--dry-allowed-length N] [--dry-base N] [--repetition-penalty N] [--frequency-penalty N] [--presence-penalty N] [--encoder-repetition-penalty N]
[--no-repeat-ngram-size N] [--repetition-penalty-range N] [--penalty-alpha N] [--guidance-scale N] [--mirostat-mode N] [--mirostat-tau N] [--mirostat-eta N]
[--do-sample | --no-do-sample] [--dynamic-temperature | --no-dynamic-temperature] [--temperature-last | --no-temperature-last] [--sampler-priority N] [--dry-sequence-breakers N]
[--enable-thinking | --no-enable-thinking] [--reasoning-effort N] [--chat-template-file CHAT_TEMPLATE_FILE]
#### GGUF (for llama.cpp and ctransformers) Text Generation Web UI
| Flag | Description | options:
|-------------|-------------| -h, --help show this help message and exit
| `--threads` | Number of threads to use. |
| `--threads-batch THREADS_BATCH` | Number of threads to use for batches/prompt processing. |
| `--n_batch` | Maximum number of prompt tokens to batch together when calling llama_eval. |
| `--n-gpu-layers N_GPU_LAYERS` | Number of layers to offload to the GPU. Only works if llama-cpp-python was compiled with BLAS. Set this to 1000000000 to offload all layers to the GPU. |
| `--n_ctx N_CTX` | Size of the prompt context. |
#### llama.cpp Basic settings:
--user-data-dir USER_DATA_DIR Path to the user data directory. Default: auto-detected.
--multi-user Multi-user mode. Chat histories are not saved or automatically loaded. Best suited for small trusted teams.
--model MODEL Name of the model to load by default.
--lora LORA [LORA ...] The list of LoRAs to load. If you want to load more than one LoRA, write the names separated by spaces.
--model-dir MODEL_DIR Path to directory with all the models.
--lora-dir LORA_DIR Path to directory with all the loras.
--model-menu Show a model menu in the terminal when the web UI is first launched.
--settings SETTINGS Load the default interface settings from this yaml file. See user_data/settings-template.yaml for an example. If you create a file called
user_data/settings.yaml, this file will be loaded by default without the need to use the --settings flag.
--extensions EXTENSIONS [EXTENSIONS ...] The list of extensions to load. If you want to load more than one extension, write the names separated by spaces.
--verbose Print the prompts to the terminal.
--idle-timeout IDLE_TIMEOUT Unload model after this many minutes of inactivity. It will be automatically reloaded when you try to use it again.
| Flag | Description | Image model:
|---------------|---------------| --image-model IMAGE_MODEL Name of the image model to select on startup (overrides saved setting).
| `--mul_mat_q` | Activate new mulmat kernels. | --image-model-dir IMAGE_MODEL_DIR Path to directory with all the image models.
| `--tensor_split TENSOR_SPLIT` | Split the model across multiple GPUs, comma-separated list of proportions, e.g. 18,17 | --image-dtype {bfloat16,float16} Data type for image model.
| `--llama_cpp_seed SEED` | Seed for llama-cpp models. Default 0 (random). | --image-attn-backend {flash_attention_2,sdpa} Attention backend for image model.
| `--cache-capacity CACHE_CAPACITY` | Maximum cache capacity. Examples: 2000MiB, 2GiB. When provided without units, bytes will be assumed. | --image-cpu-offload Enable CPU offloading for image model.
|`--cfg-cache` | llamacpp_HF: Create an additional cache for CFG negative prompts. | --image-compile Compile the image model for faster inference.
| `--no-mmap` | Prevent mmap from being used. | --image-quant {none,bnb-8bit,bnb-4bit,torchao-int8wo,torchao-fp4,torchao-float8wo}
| `--mlock` | Force the system to keep the model in RAM. | Quantization method for image model.
| `--numa` | Activate NUMA task allocation for llama.cpp |
| `--cpu` | Use the CPU version of llama-cpp-python instead of the GPU-accelerated version. |
#### ctransformers Model loader:
--loader LOADER Choose the model loader manually, otherwise, it will get autodetected. Valid options: Transformers, llama.cpp, ExLlamav3_HF, ExLlamav3, TensorRT-
LLM.
| Flag | Description | Context and cache:
|-------------|-------------| --ctx-size, --n_ctx, --max_seq_len N Context size in tokens. 0 = auto for llama.cpp (requires gpu-layers=-1), 8192 for other loaders.
| `--model_type MODEL_TYPE` | Model type of pre-quantized model. Currently gpt2, gptj, gptneox, falcon, llama, mpt, starcoder (gptbigcode), dollyv2, and replit are supported. | --cache-type, --cache_type N KV cache type; valid options: llama.cpp - fp16, q8_0, q4_0; ExLlamaV3 - fp16, q2 to q8 (can specify k_bits and v_bits separately, e.g. q4_q8).
#### AutoGPTQ Speculative decoding:
--model-draft MODEL_DRAFT Path to the draft model for speculative decoding.
--draft-max DRAFT_MAX Number of tokens to draft for speculative decoding.
--gpu-layers-draft GPU_LAYERS_DRAFT Number of layers to offload to the GPU for the draft model.
--device-draft DEVICE_DRAFT Comma-separated list of devices to use for offloading the draft model. Example: CUDA0,CUDA1
--ctx-size-draft CTX_SIZE_DRAFT Size of the prompt context for the draft model. If 0, uses the same as the main model.
--spec-type {none,ngram-mod,ngram-simple,ngram-map-k,ngram-map-k4v,ngram-cache}
Draftless speculative decoding type. Recommended: ngram-mod.
--spec-ngram-size-n SPEC_NGRAM_SIZE_N N-gram lookup size for ngram speculative decoding.
--spec-ngram-size-m SPEC_NGRAM_SIZE_M Draft n-gram size for ngram speculative decoding.
--spec-ngram-min-hits SPEC_NGRAM_MIN_HITS Minimum n-gram hits for ngram-map speculative decoding.
| Flag | Description | llama.cpp:
|------------------|-------------| --gpu-layers, --n-gpu-layers N Number of layers to offload to the GPU. -1 = auto.
| `--triton` | Use triton. | --cpu-moe Move the experts to the CPU (for MoE models).
| `--no_inject_fused_attention` | Disable the use of fused attention, which will use less VRAM at the cost of slower inference. | --mmproj MMPROJ Path to the mmproj file for vision models.
| `--no_inject_fused_mlp` | Triton mode only: disable the use of fused MLP, which will use less VRAM at the cost of slower inference. | --streaming-llm Activate StreamingLLM to avoid re-evaluating the entire prompt when old messages are removed.
| `--no_use_cuda_fp16` | This can make models faster on some systems. | --tensor-split TENSOR_SPLIT Split the model across multiple GPUs. Comma-separated list of proportions. Example: 60,40.
| `--desc_act` | For models that don't have a quantize_config.json, this parameter is used to define whether to set desc_act or not in BaseQuantizeConfig. | --row-split Split the model by rows across GPUs. This may improve multi-gpu performance.
| `--disable_exllama` | Disable ExLlama kernel, which can improve inference speed on some systems. | --no-mmap Prevent mmap from being used.
--mlock Force the system to keep the model in RAM.
--no-kv-offload Do not offload the K, Q, V to the GPU. This saves VRAM but reduces the performance.
--batch-size BATCH_SIZE Maximum number of prompt tokens to batch together when calling llama-server. This is the application level batch size.
--ubatch-size UBATCH_SIZE Maximum number of prompt tokens to batch together when calling llama-server. This is the max physical batch size for computation (device level).
--threads THREADS Number of threads to use.
--threads-batch THREADS_BATCH Number of threads to use for batches/prompt processing.
--numa Activate NUMA task allocation for llama.cpp.
--parallel PARALLEL Number of parallel request slots. The context size is divided equally among slots. For example, to have 4 slots with 8192 context each, set
ctx_size to 32768.
--fit-target FIT_TARGET Target VRAM margin per device for auto GPU layers, comma-separated list of values in MiB. A single value is broadcast across all devices.
Default: 1024.
--extra-flags EXTRA_FLAGS Extra flags to pass to llama-server. Format: "flag1=value1,flag2,flag3=value3". Example: "override-tensor=exps=CPU"
#### ExLlama Transformers/Accelerate:
--cpu Use the CPU to generate text. Warning: Training on CPU is extremely slow.
--cpu-memory CPU_MEMORY Maximum CPU memory in GiB. Use this for CPU offloading.
--disk If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk.
--disk-cache-dir DISK_CACHE_DIR Directory to save the disk cache to.
--load-in-8bit Load the model with 8-bit precision (using bitsandbytes).
--bf16 Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU.
--no-cache Set use_cache to False while generating text. This reduces VRAM usage slightly, but it comes at a performance cost.
--trust-remote-code Set trust_remote_code=True while loading the model. Necessary for some models.
--force-safetensors Set use_safetensors=True while loading the model. This prevents arbitrary code execution.
--no_use_fast Set use_fast=False while loading the tokenizer (it's True by default). Use this if you have any problems related to use_fast.
--attn-implementation IMPLEMENTATION Attention implementation. Valid options: sdpa, eager, flash_attention_2.
| Flag | Description | bitsandbytes 4-bit:
|------------------|-------------| --load-in-4bit Load the model with 4-bit precision (using bitsandbytes).
|`--gpu-split` | Comma-separated list of VRAM (in GB) to use per GPU device for model layers, e.g. `20,7,7` | --use_double_quant use_double_quant for 4-bit.
|`--max_seq_len MAX_SEQ_LEN` | Maximum sequence length. | --compute_dtype COMPUTE_DTYPE compute dtype for 4-bit. Valid options: bfloat16, float16, float32.
|`--cfg-cache` | ExLlama_HF: Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader, but not necessary for CFG with base ExLlama. | --quant_type QUANT_TYPE quant_type for 4-bit. Valid options: nf4, fp4.
#### GPTQ-for-LLaMa ExLlamaV3:
--gpu-split GPU_SPLIT Comma-separated list of VRAM (in GB) to use per GPU device for model layers. Example: 20,7,7.
--enable-tp, --enable_tp Enable Tensor Parallelism (TP) to split the model across GPUs.
--tp-backend TP_BACKEND The backend for tensor parallelism. Valid options: native, nccl. Default: native.
--cfg-cache Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader.
| Flag | Description | Gradio:
|---------------------------|-------------| --listen Make the web UI reachable from your local network.
| `--wbits WBITS` | Load a pre-quantized model with specified precision in bits. 2, 3, 4 and 8 are supported. | --listen-port LISTEN_PORT The listening port that the server will use.
| `--model_type MODEL_TYPE` | Model type of pre-quantized model. Currently LLaMA, OPT, and GPT-J are supported. | --listen-host LISTEN_HOST The hostname that the server will use.
| `--groupsize GROUPSIZE` | Group size. | --share Create a public URL. This is useful for running the web UI on Google Colab or similar.
| `--pre_layer PRE_LAYER [PRE_LAYER ...]` | The number of layers to allocate to the GPU. Setting this parameter enables CPU offloading for 4-bit models. For multi-gpu, write the numbers separated by spaces, eg `--pre_layer 30 60`. | --auto-launch Open the web UI in the default browser upon launch.
| `--checkpoint CHECKPOINT` | The path to the quantized checkpoint file. If not specified, it will be automatically detected. | --gradio-auth GRADIO_AUTH Set Gradio authentication password in the format "username:password". Multiple credentials can also be supplied with "u1:p1,u2:p2,u3:p3".
| `--monkey-patch` | Apply the monkey patch for using LoRAs with quantized models. --gradio-auth-path GRADIO_AUTH_PATH Set the Gradio authentication file path. The file should contain one or more user:password pairs in the same format as above.
--ssl-keyfile SSL_KEYFILE The path to the SSL certificate key file.
--ssl-certfile SSL_CERTFILE The path to the SSL certificate cert file.
--subpath SUBPATH Customize the subpath for gradio, use with reverse proxy
--old-colors Use the legacy Gradio colors, before the December/2024 update.
--portable Hide features not available in portable mode like training.
#### DeepSpeed API:
--api Enable the API extension.
--public-api Create a public URL for the API using Cloudflare.
--public-api-id PUBLIC_API_ID Tunnel ID for named Cloudflare Tunnel. Use together with public-api option.
--api-port API_PORT The listening port for the API.
--api-key API_KEY API authentication key.
--admin-key ADMIN_KEY API authentication key for admin tasks like loading and unloading models. If not set, will be the same as --api-key.
--api-enable-ipv6 Enable IPv6 for the API
--api-disable-ipv4 Disable IPv4 for the API
--nowebui Do not launch the Gradio UI. Useful for launching the API in standalone mode.
| Flag | Description | API generation defaults:
|---------------------------------------|-------------| --temperature N Temperature
| `--deepspeed` | Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration. | --dynatemp-low N Dynamic temperature low
| `--nvme-offload-dir NVME_OFFLOAD_DIR` | DeepSpeed: Directory to use for ZeRO-3 NVME offloading. | --dynatemp-high N Dynamic temperature high
| `--local_rank LOCAL_RANK` | DeepSpeed: Optional argument for distributed setups. | --dynatemp-exponent N Dynamic temperature exponent
--smoothing-factor N Smoothing factor
--smoothing-curve N Smoothing curve
--min-p N Min P
--top-p N Top P
--top-k N Top K
--typical-p N Typical P
--xtc-threshold N XTC threshold
--xtc-probability N XTC probability
--epsilon-cutoff N Epsilon cutoff
--eta-cutoff N Eta cutoff
--tfs N TFS
--top-a N Top A
--top-n-sigma N Top N Sigma
--adaptive-target N Adaptive target
--adaptive-decay N Adaptive decay
--dry-multiplier N DRY multiplier
--dry-allowed-length N DRY allowed length
--dry-base N DRY base
--repetition-penalty N Repetition penalty
--frequency-penalty N Frequency penalty
--presence-penalty N Presence penalty
--encoder-repetition-penalty N Encoder repetition penalty
--no-repeat-ngram-size N No repeat ngram size
--repetition-penalty-range N Repetition penalty range
--penalty-alpha N Penalty alpha
--guidance-scale N Guidance scale
--mirostat-mode N Mirostat mode
--mirostat-tau N Mirostat tau
--mirostat-eta N Mirostat eta
--do-sample, --no-do-sample Do sample
--dynamic-temperature, --no-dynamic-temperature Dynamic temperature
--temperature-last, --no-temperature-last Temperature last
--sampler-priority N Sampler priority
--dry-sequence-breakers N DRY sequence breakers
--enable-thinking, --no-enable-thinking Enable thinking
--reasoning-effort N Reasoning effort
--chat-template-file CHAT_TEMPLATE_FILE Path to a chat template file (.jinja, .jinja2, or .yaml) to use as the default instruction template for API requests. Overrides the model's
built-in template.
```
#### RWKV </details>
| Flag | Description | ## Downloading models
|---------------------------------|-------------|
| `--rwkv-strategy RWKV_STRATEGY` | RWKV: The strategy to use while loading the model. Examples: "cpu fp32", "cuda fp16", "cuda fp16i8". |
| `--rwkv-cuda-on` | RWKV: Compile the CUDA kernel for better performance. |
#### RoPE (for llama.cpp, ExLlama, ExLlamaV2, and transformers) 1. Download a GGUF model file from [Hugging Face](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads&search=gguf).
2. Place it in the `user_data/models` folder.
| Flag | Description | That's it. The UI will detect it automatically.
|------------------|-------------|
| `--alpha_value ALPHA_VALUE` | Positional embeddings alpha factor for NTK RoPE scaling. Use either this or compress_pos_emb, not both. |
| `--rope_freq_base ROPE_FREQ_BASE` | If greater than 0, will be used instead of alpha_value. Those two are related by rope_freq_base = 10000 * alpha_value ^ (64 / 63). |
| `--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should be set to (context length) / (model's original context length). Equal to 1/rope_freq_scale. |
#### Gradio To check what will fit your GPU, you can use the [VRAM Calculator](https://huggingface.co/spaces/oobabooga/accurate-gguf-vram-calculator).
| Flag | Description | <details>
|---------------------------------------|-------------| <summary>Other model types (Transformers, EXL3)</summary>
| `--listen` | Make the web UI reachable from your local network. |
| `--listen-host LISTEN_HOST` | The hostname that the server will use. |
| `--listen-port LISTEN_PORT` | The listening port that the server will use. |
| `--share` | Create a public URL. This is useful for running the web UI on Google Colab or similar. |
| `--auto-launch` | Open the web UI in the default browser upon launch. |
| `--gradio-auth USER:PWD` | set gradio authentication like "username:password"; or comma-delimit multiple like "u1:p1,u2:p2,u3:p3" |
| `--gradio-auth-path GRADIO_AUTH_PATH` | Set the gradio authentication file path. The file should contain one or more user:password pairs in this format: "u1:p1,u2:p2,u3:p3" |
| `--ssl-keyfile SSL_KEYFILE` | The path to the SSL certificate key file. |
| `--ssl-certfile SSL_CERTFILE` | The path to the SSL certificate cert file. |
#### API Models that consist of multiple files (like 16-bit Transformers models and EXL3 models) should be placed in a subfolder inside `user_data/models`:
| Flag | Description | ```
|---------------------------------------|-------------| text-generation-webui
| `--api` | Enable the API extension. | └── user_data
| `--public-api` | Create a public URL for the API using Cloudfare. | └── models
| `--public-api-id PUBLIC_API_ID` | Tunnel ID for named Cloudflare Tunnel. Use together with public-api option. | └── Qwen_Qwen3-8B
| `--api-blocking-port BLOCKING_PORT` | The listening port for the blocking API. | ├── config.json
| `--api-streaming-port STREAMING_PORT` | The listening port for the streaming API. | ├── generation_config.json
├── model-00001-of-00004.safetensors
├── ...
├── tokenizer_config.json
└── tokenizer.json
```
#### Multimodal These formats require the one-click installer (not the portable build).
</details>
| Flag | Description | ## Documentation
|---------------------------------------|-------------|
| `--multimodal-pipeline PIPELINE` | The multimodal pipeline to use. Examples: `llava-7b`, `llava-13b`. |
## Presets https://github.com/oobabooga/text-generation-webui/wiki
Inference settings presets can be created under `presets/` as yaml files. These files are detected automatically at startup.
The presets that are included by default are the result of a contest that received 7215 votes. More details can be found [here](https://github.com/oobabooga/oobabooga.github.io/blob/main/arena/results.md).
## Contributing
If you would like to contribute to the project, check out the [Contributing guidelines](https://github.com/oobabooga/text-generation-webui/wiki/Contributing-guidelines).
## Community ## Community
* Subreddit: https://www.reddit.com/r/oobabooga/ https://www.reddit.com/r/Oobabooga/
* Discord: https://discord.gg/jwZCF2dPQN
## Acknowledgment ## Acknowledgments
In August 2023, [Andreessen Horowitz](https://a16z.com/) (a16z) provided a generous grant to encourage and support my independent work on this project. I am **extremely** grateful for their trust and recognition, which will allow me to dedicate more time towards realizing the full potential of text-generation-webui. - In August 2023, [Andreessen Horowitz](https://a16z.com/) (a16z) provided a generous grant to encourage and support my independent work on this project. I am **extremely** grateful for their trust and recognition.
- This project was inspired by [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) and wouldn't exist without it.

View file

@ -1,112 +0,0 @@
import asyncio
import html
import json
import sys
try:
import websockets
except ImportError:
print("Websockets package not found. Make sure it's installed.")
# For local streaming, the websockets are hosted without ssl - ws://
HOST = 'localhost:5005'
URI = f'ws://{HOST}/api/v1/chat-stream'
# For reverse-proxied streaming, the remote will likely host with ssl - wss://
# URI = 'wss://your-uri-here.trycloudflare.com/api/v1/stream'
async def run(user_input, history):
# Note: the selected defaults change from time to time.
request = {
'user_input': user_input,
'max_new_tokens': 250,
'auto_max_new_tokens': False,
'max_tokens_second': 0,
'history': history,
'mode': 'instruct', # Valid options: 'chat', 'chat-instruct', 'instruct'
'character': 'Example',
'instruction_template': 'Vicuna-v1.1', # Will get autodetected if unset
'your_name': 'You',
# 'name1': 'name of user', # Optional
# 'name2': 'name of character', # Optional
# 'context': 'character context', # Optional
# 'greeting': 'greeting', # Optional
# 'name1_instruct': 'You', # Optional
# 'name2_instruct': 'Assistant', # Optional
# 'context_instruct': 'context_instruct', # Optional
# 'turn_template': 'turn_template', # Optional
'regenerate': False,
'_continue': False,
'chat_instruct_command': 'Continue the chat dialogue below. Write a single reply for the character "<|character|>".\n\n<|prompt|>',
# Generation params. If 'preset' is set to different than 'None', the values
# in presets/preset-name.yaml are used instead of the individual numbers.
'preset': 'None',
'do_sample': True,
'temperature': 0.7,
'top_p': 0.1,
'typical_p': 1,
'epsilon_cutoff': 0, # In units of 1e-4
'eta_cutoff': 0, # In units of 1e-4
'tfs': 1,
'top_a': 0,
'repetition_penalty': 1.18,
'repetition_penalty_range': 0,
'top_k': 40,
'min_length': 0,
'no_repeat_ngram_size': 0,
'num_beams': 1,
'penalty_alpha': 0,
'length_penalty': 1,
'early_stopping': False,
'mirostat_mode': 0,
'mirostat_tau': 5,
'mirostat_eta': 0.1,
'grammar_string': '',
'guidance_scale': 1,
'negative_prompt': '',
'seed': -1,
'add_bos_token': True,
'truncation_length': 2048,
'ban_eos_token': False,
'custom_token_bans': '',
'skip_special_tokens': True,
'stopping_strings': []
}
async with websockets.connect(URI, ping_interval=None) as websocket:
await websocket.send(json.dumps(request))
while True:
incoming_data = await websocket.recv()
incoming_data = json.loads(incoming_data)
match incoming_data['event']:
case 'text_stream':
yield incoming_data['history']
case 'stream_end':
return
async def print_response_stream(user_input, history):
cur_len = 0
async for new_history in run(user_input, history):
cur_message = new_history['visible'][-1][1][cur_len:]
cur_len += len(cur_message)
print(html.unescape(cur_message), end='')
sys.stdout.flush() # If we don't flush, we won't see tokens in realtime.
if __name__ == '__main__':
user_input = "Please give me a step-by-step guide on how to plant a tree in my backyard."
# Basic example
history = {'internal': [], 'visible': []}
# "Continue" example. Make sure to set '_continue' to True above
# arr = [user_input, 'Surely, here is']
# history = {'internal': [arr], 'visible': [arr]}
asyncio.run(print_response_stream(user_input, history))

View file

@ -1,92 +0,0 @@
import html
import json
import requests
# For local streaming, the websockets are hosted without ssl - http://
HOST = 'localhost:5000'
URI = f'http://{HOST}/api/v1/chat'
# For reverse-proxied streaming, the remote will likely host with ssl - https://
# URI = 'https://your-uri-here.trycloudflare.com/api/v1/chat'
def run(user_input, history):
request = {
'user_input': user_input,
'max_new_tokens': 250,
'auto_max_new_tokens': False,
'max_tokens_second': 0,
'history': history,
'mode': 'instruct', # Valid options: 'chat', 'chat-instruct', 'instruct'
'character': 'Example',
'instruction_template': 'Vicuna-v1.1', # Will get autodetected if unset
'your_name': 'You',
# 'name1': 'name of user', # Optional
# 'name2': 'name of character', # Optional
# 'context': 'character context', # Optional
# 'greeting': 'greeting', # Optional
# 'name1_instruct': 'You', # Optional
# 'name2_instruct': 'Assistant', # Optional
# 'context_instruct': 'context_instruct', # Optional
# 'turn_template': 'turn_template', # Optional
'regenerate': False,
'_continue': False,
'chat_instruct_command': 'Continue the chat dialogue below. Write a single reply for the character "<|character|>".\n\n<|prompt|>',
# Generation params. If 'preset' is set to different than 'None', the values
# in presets/preset-name.yaml are used instead of the individual numbers.
'preset': 'None',
'do_sample': True,
'temperature': 0.7,
'top_p': 0.1,
'typical_p': 1,
'epsilon_cutoff': 0, # In units of 1e-4
'eta_cutoff': 0, # In units of 1e-4
'tfs': 1,
'top_a': 0,
'repetition_penalty': 1.18,
'repetition_penalty_range': 0,
'top_k': 40,
'min_length': 0,
'no_repeat_ngram_size': 0,
'num_beams': 1,
'penalty_alpha': 0,
'length_penalty': 1,
'early_stopping': False,
'mirostat_mode': 0,
'mirostat_tau': 5,
'mirostat_eta': 0.1,
'grammar_string': '',
'guidance_scale': 1,
'negative_prompt': '',
'seed': -1,
'add_bos_token': True,
'truncation_length': 2048,
'ban_eos_token': False,
'custom_token_bans': '',
'skip_special_tokens': True,
'stopping_strings': []
}
response = requests.post(URI, json=request)
if response.status_code == 200:
result = response.json()['results'][0]['history']
print(json.dumps(result, indent=4))
print()
print(html.unescape(result['visible'][-1][1]))
if __name__ == '__main__':
user_input = "Please give me a step-by-step guide on how to plant a tree in my backyard."
# Basic example
history = {'internal': [], 'visible': []}
# "Continue" example. Make sure to set '_continue' to True above
# arr = [user_input, 'Surely, here is']
# history = {'internal': [arr], 'visible': [arr]}
run(user_input, history)

View file

@ -1,176 +0,0 @@
#!/usr/bin/env python3
import requests
HOST = '0.0.0.0:5000'
def generate(prompt, tokens=200):
request = {'prompt': prompt, 'max_new_tokens': tokens}
response = requests.post(f'http://{HOST}/api/v1/generate', json=request)
if response.status_code == 200:
return response.json()['results'][0]['text']
def model_api(request):
response = requests.post(f'http://{HOST}/api/v1/model', json=request)
return response.json()
# print some common settings
def print_basic_model_info(response):
basic_settings = ['truncation_length', 'instruction_template']
print("Model: ", response['result']['model_name'])
print("Lora(s): ", response['result']['lora_names'])
for setting in basic_settings:
print(setting, "=", response['result']['shared.settings'][setting])
# model info
def model_info():
response = model_api({'action': 'info'})
print_basic_model_info(response)
# simple loader
def model_load(model_name):
return model_api({'action': 'load', 'model_name': model_name})
# complex loader
def complex_model_load(model):
def guess_groupsize(model_name):
if '1024g' in model_name:
return 1024
elif '128g' in model_name:
return 128
elif '32g' in model_name:
return 32
else:
return -1
req = {
'action': 'load',
'model_name': model,
'args': {
'loader': 'AutoGPTQ',
'bf16': False,
'load_in_8bit': False,
'groupsize': 0,
'wbits': 0,
# llama.cpp
'threads': 0,
'n_batch': 512,
'no_mmap': False,
'mlock': False,
'cache_capacity': None,
'n_gpu_layers': 0,
'n_ctx': 2048,
# RWKV
'rwkv_strategy': None,
'rwkv_cuda_on': False,
# b&b 4-bit
# 'load_in_4bit': False,
# 'compute_dtype': 'float16',
# 'quant_type': 'nf4',
# 'use_double_quant': False,
# "cpu": false,
# "auto_devices": false,
# "gpu_memory": null,
# "cpu_memory": null,
# "disk": false,
# "disk_cache_dir": "cache",
},
}
model = model.lower()
if '4bit' in model or 'gptq' in model or 'int4' in model:
req['args']['wbits'] = 4
req['args']['groupsize'] = guess_groupsize(model)
elif '3bit' in model:
req['args']['wbits'] = 3
req['args']['groupsize'] = guess_groupsize(model)
else:
req['args']['gptq_for_llama'] = False
if '8bit' in model:
req['args']['load_in_8bit'] = True
elif '-hf' in model or 'fp16' in model:
if '7b' in model:
req['args']['bf16'] = True # for 24GB
elif '13b' in model:
req['args']['load_in_8bit'] = True # for 24GB
elif 'gguf' in model:
# req['args']['threads'] = 16
if '7b' in model:
req['args']['n_gpu_layers'] = 100
elif '13b' in model:
req['args']['n_gpu_layers'] = 100
elif '30b' in model or '33b' in model:
req['args']['n_gpu_layers'] = 59 # 24GB
elif '65b' in model:
req['args']['n_gpu_layers'] = 42 # 24GB
elif 'rwkv' in model:
req['args']['rwkv_cuda_on'] = True
if '14b' in model:
req['args']['rwkv_strategy'] = 'cuda f16i8' # 24GB
else:
req['args']['rwkv_strategy'] = 'cuda f16' # 24GB
return model_api(req)
if __name__ == '__main__':
for model in model_api({'action': 'list'})['result']:
try:
resp = complex_model_load(model)
if 'error' in resp:
print(f"{model} FAIL Error: {resp['error']['message']}")
continue
else:
print_basic_model_info(resp)
ans = generate("0,1,1,2,3,5,8,13,", tokens=2)
if '21' in ans:
print(f"{model} PASS ({ans})")
else:
print(f"{model} FAIL ({ans})")
except Exception as e:
print(f"{model} FAIL Exception: {repr(e)}")
# 0,1,1,2,3,5,8,13, is the fibonacci sequence, the next number is 21.
# Some results below.
""" $ ./model-api-example.py
Model: 4bit_gpt4-x-alpaca-13b-native-4bit-128g-cuda
Lora(s): []
truncation_length = 2048
instruction_template = Alpaca
4bit_gpt4-x-alpaca-13b-native-4bit-128g-cuda PASS (21)
Model: 4bit_WizardLM-13B-Uncensored-4bit-128g
Lora(s): []
truncation_length = 2048
instruction_template = WizardLM
4bit_WizardLM-13B-Uncensored-4bit-128g PASS (21)
Model: Aeala_VicUnlocked-alpaca-30b-4bit
Lora(s): []
truncation_length = 2048
instruction_template = Alpaca
Aeala_VicUnlocked-alpaca-30b-4bit PASS (21)
Model: alpaca-30b-4bit
Lora(s): []
truncation_length = 2048
instruction_template = Alpaca
alpaca-30b-4bit PASS (21)
"""

View file

@ -1,86 +0,0 @@
import asyncio
import json
import sys
try:
import websockets
except ImportError:
print("Websockets package not found. Make sure it's installed.")
# For local streaming, the websockets are hosted without ssl - ws://
HOST = 'localhost:5005'
URI = f'ws://{HOST}/api/v1/stream'
# For reverse-proxied streaming, the remote will likely host with ssl - wss://
# URI = 'wss://your-uri-here.trycloudflare.com/api/v1/stream'
async def run(context):
# Note: the selected defaults change from time to time.
request = {
'prompt': context,
'max_new_tokens': 250,
'auto_max_new_tokens': False,
'max_tokens_second': 0,
# Generation params. If 'preset' is set to different than 'None', the values
# in presets/preset-name.yaml are used instead of the individual numbers.
'preset': 'None',
'do_sample': True,
'temperature': 0.7,
'top_p': 0.1,
'typical_p': 1,
'epsilon_cutoff': 0, # In units of 1e-4
'eta_cutoff': 0, # In units of 1e-4
'tfs': 1,
'top_a': 0,
'repetition_penalty': 1.18,
'repetition_penalty_range': 0,
'top_k': 40,
'min_length': 0,
'no_repeat_ngram_size': 0,
'num_beams': 1,
'penalty_alpha': 0,
'length_penalty': 1,
'early_stopping': False,
'mirostat_mode': 0,
'mirostat_tau': 5,
'mirostat_eta': 0.1,
'grammar_string': '',
'guidance_scale': 1,
'negative_prompt': '',
'seed': -1,
'add_bos_token': True,
'truncation_length': 2048,
'ban_eos_token': False,
'custom_token_bans': '',
'skip_special_tokens': True,
'stopping_strings': []
}
async with websockets.connect(URI, ping_interval=None) as websocket:
await websocket.send(json.dumps(request))
yield context # Remove this if you just want to see the reply
while True:
incoming_data = await websocket.recv()
incoming_data = json.loads(incoming_data)
match incoming_data['event']:
case 'text_stream':
yield incoming_data['text']
case 'stream_end':
return
async def print_response_stream(prompt):
async for response in run(prompt):
print(response, end='')
sys.stdout.flush() # If we don't flush, we won't see tokens in realtime.
if __name__ == '__main__':
prompt = "In order to make homemade bread, follow these steps:\n1)"
asyncio.run(print_response_stream(prompt))

View file

@ -1,63 +0,0 @@
import requests
# For local streaming, the websockets are hosted without ssl - http://
HOST = 'localhost:5000'
URI = f'http://{HOST}/api/v1/generate'
# For reverse-proxied streaming, the remote will likely host with ssl - https://
# URI = 'https://your-uri-here.trycloudflare.com/api/v1/generate'
def run(prompt):
request = {
'prompt': prompt,
'max_new_tokens': 250,
'auto_max_new_tokens': False,
'max_tokens_second': 0,
# Generation params. If 'preset' is set to different than 'None', the values
# in presets/preset-name.yaml are used instead of the individual numbers.
'preset': 'None',
'do_sample': True,
'temperature': 0.7,
'top_p': 0.1,
'typical_p': 1,
'epsilon_cutoff': 0, # In units of 1e-4
'eta_cutoff': 0, # In units of 1e-4
'tfs': 1,
'top_a': 0,
'repetition_penalty': 1.18,
'repetition_penalty_range': 0,
'top_k': 40,
'min_length': 0,
'no_repeat_ngram_size': 0,
'num_beams': 1,
'penalty_alpha': 0,
'length_penalty': 1,
'early_stopping': False,
'mirostat_mode': 0,
'mirostat_tau': 5,
'mirostat_eta': 0.1,
'grammar_string': '',
'guidance_scale': 1,
'negative_prompt': '',
'seed': -1,
'add_bos_token': True,
'truncation_length': 2048,
'ban_eos_token': False,
'custom_token_bans': '',
'skip_special_tokens': True,
'stopping_strings': []
}
response = requests.post(URI, json=request)
if response.status_code == 200:
result = response.json()['results'][0]['text']
print(prompt + result)
if __name__ == '__main__':
prompt = "In order to make homemade bread, follow these steps:\n1)"
run(prompt)

View file

@ -1,8 +1,8 @@
#!/bin/bash #!/usr/bin/env bash
cd "$(dirname "${BASH_SOURCE[0]}")" cd "$(dirname "${BASH_SOURCE[0]}")"
if [[ "$(pwd)" =~ " " ]]; then echo This script relies on Miniconda which can not be silently installed under a path with spaces. && exit; fi if [[ "$(pwd)" =~ " " ]]; then echo This script relies on Miniforge which can not be silently installed under a path with spaces. && exit; fi
# deactivate existing conda envs as needed to avoid conflicts # deactivate existing conda envs as needed to avoid conflicts
{ conda deactivate && conda deactivate && conda deactivate; } 2> /dev/null { conda deactivate && conda deactivate && conda deactivate; } 2> /dev/null

View file

@ -2,7 +2,7 @@
cd "$(dirname "${BASH_SOURCE[0]}")" cd "$(dirname "${BASH_SOURCE[0]}")"
if [[ "$(pwd)" =~ " " ]]; then echo This script relies on Miniconda which can not be silently installed under a path with spaces. && exit; fi if [[ "$(pwd)" =~ " " ]]; then echo This script relies on Miniforge which can not be silently installed under a path with spaces. && exit; fi
# deactivate existing conda envs as needed to avoid conflicts # deactivate existing conda envs as needed to avoid conflicts
{ conda deactivate && conda deactivate && conda deactivate; } 2> /dev/null { conda deactivate && conda deactivate && conda deactivate; } 2> /dev/null

View file

@ -4,7 +4,7 @@ cd /D "%~dp0"
set PATH=%PATH%;%SystemRoot%\system32 set PATH=%PATH%;%SystemRoot%\system32
echo "%CD%"| findstr /C:" " >nul && echo This script relies on Miniconda which can not be silently installed under a path with spaces. && goto end echo "%CD%"| findstr /C:" " >nul && echo This script relies on Miniforge which can not be silently installed under a path with spaces. && goto end
@rem fix failed install when installing to a separate drive @rem fix failed install when installing to a separate drive
set TMP=%cd%\installer_files set TMP=%cd%\installer_files
@ -21,11 +21,12 @@ set INSTALL_ENV_DIR=%cd%\installer_files\env
set PYTHONNOUSERSITE=1 set PYTHONNOUSERSITE=1
set PYTHONPATH= set PYTHONPATH=
set PYTHONHOME= set PYTHONHOME=
set PYTHONUTF8=1
set "CUDA_PATH=%INSTALL_ENV_DIR%" set "CUDA_PATH=%INSTALL_ENV_DIR%"
set "CUDA_HOME=%CUDA_PATH%" set "CUDA_HOME=%CUDA_PATH%"
@rem activate installer env @rem activate installer env
call "%CONDA_ROOT_PREFIX%\condabin\conda.bat" activate "%INSTALL_ENV_DIR%" || ( echo. && echo Miniconda hook not found. && goto end ) call "%CONDA_ROOT_PREFIX%\condabin\conda.bat" activate "%INSTALL_ENV_DIR%" || ( echo. && echo Miniforge hook not found. && goto end )
@rem enter commands @rem enter commands
cmd /k "%*" cmd /k "%*"

View file

@ -1,11 +0,0 @@
@echo off
cd /D "%~dp0"
set PATH=%PATH%;%SystemRoot%\system32
@rem sed -i 's/\x0D$//' ./wsl.sh converts newlines to unix format in the wsl script
call wsl -e bash -lic "sed -i 's/\x0D$//' ./wsl.sh; source ./wsl.sh cmd"
:end
pause

View file

@ -1,38 +0,0 @@
'''
Converts a transformers model to safetensors format and shards it.
This makes it faster to load (because of safetensors) and lowers its RAM usage
while loading (because of sharding).
Based on the original script by 81300:
https://gist.github.com/81300/fe5b08bff1cba45296a829b9d6b0f303
'''
import argparse
from pathlib import Path
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
parser = argparse.ArgumentParser(formatter_class=lambda prog: argparse.HelpFormatter(prog, max_help_position=54))
parser.add_argument('MODEL', type=str, default=None, nargs='?', help="Path to the input model.")
parser.add_argument('--output', type=str, default=None, help='Path to the output folder (default: models/{model_name}_safetensors).')
parser.add_argument("--max-shard-size", type=str, default="2GB", help="Maximum size of a shard in GB or MB (default: %(default)s).")
parser.add_argument('--bf16', action='store_true', help='Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU.')
args = parser.parse_args()
if __name__ == '__main__':
path = Path(args.MODEL)
model_name = path.name
print(f"Loading {model_name}...")
model = AutoModelForCausalLM.from_pretrained(path, low_cpu_mem_usage=True, torch_dtype=torch.bfloat16 if args.bf16 else torch.float16)
tokenizer = AutoTokenizer.from_pretrained(path)
out_folder = args.output or Path(f"models/{model_name}_safetensors")
print(f"Saving the converted model to {out_folder} with a maximum shard size of {args.max_shard_size}...")
model.save_pretrained(out_folder, max_shard_size=args.max_shard_size, safe_serialization=True)
tokenizer.save_pretrained(out_folder)

Binary file not shown.

Binary file not shown.

129
css/chat_style-Dark.css Normal file
View file

@ -0,0 +1,129 @@
.message {
display: grid;
align-items: start;
grid-template-columns: 60px minmax(0, 1fr);
width: min(100%, calc(724px + 60px));
padding-bottom: 22px;
padding-top: 6px;
font-size: 18px;
font-family: Roboto, Arial, sans-serif; /* Modern font */
line-height: 1.5;
}
.circle-you,
.circle-bot {
background-color: #2b2b2b; /* Darker background for circles */
border-radius: 50%; /* Perfect circle */
border: 1px solid #4a90e2; /* Soft blue border */
box-shadow: 0 4px 8px rgb(0 0 0 / 50%); /* Soft shadow for depth */
}
.circle-bot img,
.circle-you img {
border-radius: 50%; /* Make images circular */
width: 100%;
height: 100%;
object-fit: cover;
}
.circle-you, .circle-bot {
width: 64px; /* Smaller size for modern look */
height: 64px;
}
.text {
padding-left: 12px; /* Reduced padding for a cleaner layout */
color: #f0f0f0; /* Light text color for readability */
}
.text p {
margin-top: 2px;
}
.username {
padding-left: 10px;
font-size: 20px;
font-weight: bold;
color: #e0e0e0; /* Light gray text */
transition: color 0.3s ease; /* Smooth color transition */
}
.username:hover {
color: #4a90e2; /* Blue color on hover */
}
.message-body {
position: relative;
border: 1px solid rgb(255 255 255 / 10%); /* Soft white border */
border-radius: 8px; /* Slightly rounded corners */
padding: 15px;
background: #1e1e1e; /* Dark background */
box-shadow: 0 4px 10px rgb(0 0 0 / 30%); /* Subtle shadow for depth */
transition: background 0.3s ease; /* Smooth transition for background */
}
.message-body:hover {
background: #252525; /* Slightly lighter on hover */
}
/* Adds 2 extra lines at the top and bottom of the message */
.message-body::before,
.message-body::after {
content: "";
position: absolute;
left: 10px;
right: 10px;
height: 1px;
background-color: rgb(255 255 255 / 5%); /* Faded lines for subtle separation */
}
.message-body::before {
top: 4px;
}
.message-body::after {
bottom: 4px;
}
.message-body img {
max-width: 300px;
max-height: 300px;
border-radius: 10px; /* Rounded corners for images */
}
.message-body p {
color: #e0e0e0 !important; /* Light color for text */
}
.message-body p em {
color: #a6a6a6 !important; /* Softer gray for emphasized text */
}
@media screen and (width <= 688px) {
.message {
display: grid;
align-items: start;
grid-template-columns: 60px minmax(0, 1fr);
padding-bottom: 25px;
font-size: 15px;
font-family: Roboto, Arial, sans-serif; /* Modern font */
line-height: 1.5;
}
.circle-you, .circle-bot {
width: 40px; /* Smaller size for mobile */
height: 40px;
}
.text {
padding-left: 10px; /* Reduced padding for mobile */
}
.message-body p {
font-size: 14px !important;
}
.username {
font-size: 18px; /* Smaller username for mobile */
}
}

View file

@ -2,8 +2,11 @@
.message { .message {
display: grid; display: grid;
align-items: start;
grid-template-columns: 60px minmax(0, 1fr); grid-template-columns: 60px minmax(0, 1fr);
padding-bottom: 28px; width: min(100%, calc(724px + 60px + 90px));
padding-bottom: 21px;
padding-top: 7px;
font-size: 18px; font-size: 18px;
font-family: 'Noto Sans', Arial, sans-serif; font-family: 'Noto Sans', Arial, sans-serif;
line-height: 1.428571429; line-height: 1.428571429;
@ -25,15 +28,15 @@
} }
.circle-you, .circle-bot { .circle-you, .circle-bot {
/*You can set the size of the profile images here, but if you do, you have to also adjust the .text{padding-left: 90px} to a different number according to the width of the image which is right below here*/ /* You can set the size of the profile images here, but if you do, you have to also adjust the .text{padding-left: 90px} to a different number according to the width of the image which is right below here */
width: 135px; width: 135px;
height: 175px; height: 175px;
} }
.text { .text {
/*Change this to move the message box further left or right depending on the size of your profile pic*/ /* Change this to move the message box further left or right depending on the size of your profile pic */
padding-left: 90px; padding-left: 90px;
text-shadow: 2px 2px 2px rgb(0, 0, 0, 0.4); text-shadow: 2px 2px 2px rgb(0 0 0 / 40%);
} }
.text p { .text p {
@ -44,37 +47,37 @@
padding-left: 10px; padding-left: 10px;
font-size: 22px; font-size: 22px;
font-weight: bold; font-weight: bold;
border-top: 1px solid rgb(51, 64, 90); border-top: 1px solid rgb(51 64 90);
padding: 3px; padding: 3px;
} }
.message-body { .message-body {
position: relative; position: relative;
border-radius: 1rem; border: 1px solid rgb(255 255 255 / 45.9%);
border: 1px solid rgba(255, 255, 255, 0.459);
border-radius: 10px; border-radius: 10px;
padding: 10px; padding: 10px;
padding-top: 5px; padding-top: 5px;
/*Message gradient background color - remove the line bellow if you don't want a background color or gradient*/
/* Message gradient background color - remove the line bellow if you don't want a background color or gradient */
background: linear-gradient(to bottom, #171730, #1b263f); background: linear-gradient(to bottom, #171730, #1b263f);
} }
/*Adds 2 extra lines at the top and bottom of the message*/ /* Adds 2 extra lines at the top and bottom of the message */
.message-body:before, .message-body::before,
.message-body:after { .message-body::after {
content: ""; content: "";
position: absolute; position: absolute;
left: 10px; left: 10px;
right: 10px; right: 10px;
height: 1px; height: 1px;
background-color: rgba(255, 255, 255, 0.13); background-color: rgb(255 255 255 / 13%);
} }
.message-body:before { .message-body::before {
top: 6px; top: 6px;
} }
.message-body:after { .message-body::after {
bottom: 6px; bottom: 6px;
} }
@ -84,21 +87,21 @@
border-radius: 20px; border-radius: 20px;
} }
.message-body p { .message-body p, .message-body li {
margin-bottom: 0 !important;
font-size: 18px !important; font-size: 18px !important;
line-height: 1.428571429 !important; color: rgb(243 244 246) !important;
color: rgb(243, 244, 246) !important; text-shadow: 2px 2px 2px rgb(0 0 0);
text-shadow: 2px 2px 2px rgb(0, 0, 0); font-weight: 500;
} }
.message-body p em { .message-body p em {
color: rgb(138, 138, 138) !important; color: rgb(138 138 138) !important;
} }
@media screen and (max-width: 688px) { @media screen and (width <= 688px) {
.message { .message {
display: grid; display: grid;
align-items: start;
grid-template-columns: 60px minmax(0, 1fr); grid-template-columns: 60px minmax(0, 1fr);
padding-bottom: 25px; padding-bottom: 25px;
font-size: 15px; font-size: 15px;
@ -120,10 +123,10 @@
} }
.text { .text {
padding-left: 0px; padding-left: 0;
} }
.message-body p { .message-body p, .message-body li {
font-size: 16px !important; font-size: 16px !important;
} }

View file

@ -16,6 +16,8 @@
} }
.message { .message {
padding-bottom: 30px; padding-bottom: 1.5em;
padding-top: 0.5em;
grid-template-columns: 70px minmax(0, 1fr); grid-template-columns: 70px minmax(0, 1fr);
width: min(100%, calc(724px + 70px));
} }

View file

@ -1,23 +1,31 @@
.message { .message {
display: grid; display: grid;
align-items: start;
grid-template-columns: 60px minmax(0, 1fr); grid-template-columns: 60px minmax(0, 1fr);
padding-bottom: 25px; width: min(100%, calc(724px + 60px));
padding-bottom: 1.5em;
padding-top: 0.5em;
font-size: 15px; font-size: 15px;
font-family: 'Noto Sans', Helvetica, Arial, sans-serif; font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 23px !important; line-height: 22.5px !important;
}
.message-body {
margin-top: 3px;
font-size: 15px !important;
} }
.circle-you { .circle-you {
width: 50px; width: 50px;
height: 50px; height: 50px;
background-color: rgb(238, 78, 59); background-color: rgb(238 78 59);
border-radius: 50%; border-radius: 50%;
} }
.circle-bot { .circle-bot {
width: 50px; width: 50px;
height: 50px; height: 50px;
background-color: rgb(59, 78, 244); background-color: rgb(59 78 244);
border-radius: 50%; border-radius: 50%;
} }
@ -29,10 +37,6 @@
object-fit: cover; object-fit: cover;
} }
.text p {
margin-top: 5px;
}
.username { .username {
font-weight: bold; font-weight: bold;
} }
@ -43,17 +47,15 @@
border-radius: 20px; border-radius: 20px;
} }
.message-body p { .message-body p, .message-body li {
margin-bottom: 0 !important; font-weight: 500;
font-size: 15px !important;
line-height: 23px !important;
} }
.dark .message-body p em { .dark .message-body p em {
color: rgb(138, 138, 138) !important; color: rgb(138 138 138) !important;
} }
.message-body p em { .message-body p em {
color: rgb(110, 110, 110) !important; color: rgb(110 110 110) !important;
font-weight: 500; font-weight: 500;
} }

View file

@ -1,5 +1,7 @@
.message { .message {
padding-bottom: 25px; width: min(100%, calc(724px + 60px));
padding-bottom: 22px;
padding-top: 3px;
font-size: 15px; font-size: 15px;
font-family: 'Noto Sans', Helvetica, Arial, sans-serif; font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 1.428571429; line-height: 1.428571429;
@ -8,14 +10,14 @@
.circle-you { .circle-you {
width: 50px; width: 50px;
height: 50px; height: 50px;
background-color: rgb(238, 78, 59); background-color: rgb(238 78 59);
border-radius: 50%; border-radius: 50%;
} }
.circle-bot { .circle-bot {
width: 50px; width: 50px;
height: 50px; height: 50px;
background-color: rgb(59, 78, 244); background-color: rgb(59 78 244);
border-radius: 50%; border-radius: 50%;
float: left; float: left;
margin-right: 10px; margin-right: 10px;
@ -47,7 +49,7 @@
.circle-you + .text { .circle-you + .text {
float: right; float: right;
background-color: rgb(0, 132, 255); background-color: rgb(0 132 255);
margin-right: 10px; margin-right: 10px;
} }
@ -59,8 +61,10 @@
text-align: right; text-align: right;
} }
.dark .circle-bot + .text div, .dark .circle-bot + .text * { .dark .circle-bot + .text div, .dark .circle-bot + .text *,
color: #000; .dark .chat .message .circle-bot + .text .message-body :is(h1, h2, h3, h4, h5, h6),
.dark .chat .message .circle-bot + .text .message-body a {
color: #000 !important;
} }
.text { .text {
@ -75,25 +79,29 @@
font-weight: bold; font-weight: bold;
} }
.message-body {
}
.message-body img { .message-body img {
max-width: 300px; max-width: 300px;
max-height: 300px; max-height: 300px;
border-radius: 20px; border-radius: 20px;
} }
.message-body p { .message-body p, .message-body li {
margin-bottom: 0 !important;
font-size: 15px !important; font-size: 15px !important;
line-height: 1.428571429 !important; font-weight: 500;
} }
.dark .message-body p em { .dark .message-body p em {
color: rgb(138, 138, 138) !important; color: rgb(138 138 138) !important;
} }
.message-body p em { .message-body p em {
color: rgb(110, 110, 110) !important; color: rgb(110 110 110) !important;
}
.editing-textarea {
width: max(30rem) !important;
}
.circle-you + .text .edit-control-button, .circle-you + .text .editing-textarea {
color: #000 !important;
} }

View file

@ -1,55 +1,97 @@
.message { .message {
padding-bottom: 25px; display: block;
width: min(100%, 724px);
padding-top: 0;
padding-bottom: 21px;
font-size: 15px; font-size: 15px;
font-family: 'Noto Sans', Helvetica, Arial, sans-serif; font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 1.428571429; line-height: 1.428571429;
grid-template-columns: none;
} }
.text-you { .circle-you, .circle-bot {
display: none;
}
.text {
max-width: 65%;
border-radius: 18px;
padding: 12px 16px;
margin-bottom: 8px;
clear: both;
box-shadow: 0 1px 2px rgb(0 0 0 / 10%);
}
.username {
font-weight: 600;
margin-bottom: 8px;
opacity: 0.65;
padding-left: 0;
}
/* User messages - right aligned, WhatsApp green */
.circle-you + .text {
background-color: #d9fdd3; background-color: #d9fdd3;
border-radius: 15px;
padding: 10px;
padding-top: 5px;
float: right; float: right;
margin-left: auto;
margin-right: 8px;
} }
.text-bot { .circle-you + .text .username {
background-color: #f2f2f2; display: none;
border-radius: 15px;
padding: 10px;
padding-top: 5px;
} }
.dark .text-you { /* Bot messages - left aligned, white */
background-color: #005c4b; .circle-bot + .text {
color: #111b21; background-color: #fff;
float: left;
margin-right: auto;
margin-left: 8px;
border: 1px solid #e5e5e5;
} }
.dark .text-bot { .circle-bot + .text .message-actions {
background-color: #1f2937; bottom: -25px !important;
color: #111b21;
} }
.text-bot p, .text-you p { /* Dark theme colors */
margin-top: 5px; .dark .circle-you + .text {
background-color: #144d37;
color: #e4e6ea;
box-shadow: 0 1px 2px rgb(0 0 0 / 30%);
}
.dark .circle-bot + .text {
background-color: #202c33;
color: #e4e6ea;
border: 1px solid #3c4043;
box-shadow: 0 1px 2px rgb(0 0 0 / 30%);
}
.dark .username {
opacity: 0.7;
} }
.message-body img { .message-body img {
max-width: 300px; max-width: 300px;
max-height: 300px; max-height: 300px;
border-radius: 20px; border-radius: 12px;
} }
.message-body p { .message-body p, .message-body li {
margin-bottom: 0 !important;
font-size: 15px !important; font-size: 15px !important;
line-height: 1.428571429 !important;
} }
.dark .message-body p em { .dark .message-body p em {
color: rgb(138, 138, 138) !important; color: rgb(170 170 170) !important;
} }
.message-body p em { .message-body p em {
color: rgb(110, 110, 110) !important; color: rgb(100 100 100) !important;
} }
/* Message actions positioning */
.message-actions {
margin-top: 8px;
}

111
css/highlightjs/github-dark.min.css vendored Normal file
View file

@ -0,0 +1,111 @@
html body gradio-app .gradio-container pre code.hljs {
display: block;
overflow-x: auto;
padding: 1em
}
html body gradio-app .gradio-container code.hljs {
padding: 3px 5px
}
/*!
Theme: GitHub Dark
Description: Dark theme as seen on github.com
Author: github.com
Maintainer: @Hirse
Updated: 2021-05-15
Outdated base version: https://github.com/primer/github-syntax-dark
Current colors taken from GitHub's CSS
*/
html body gradio-app .gradio-container .hljs {
color: #c9d1d9;
background: #0d1117
}
html body gradio-app .gradio-container .hljs-doctag,
html body gradio-app .gradio-container .hljs-keyword,
html body gradio-app .gradio-container .hljs-meta .hljs-keyword,
html body gradio-app .gradio-container .hljs-template-tag,
html body gradio-app .gradio-container .hljs-template-variable,
html body gradio-app .gradio-container .hljs-type,
html body gradio-app .gradio-container .hljs-variable.language_ {
color: #ff7b72
}
html body gradio-app .gradio-container .hljs-title,
html body gradio-app .gradio-container .hljs-title.class_,
html body gradio-app .gradio-container .hljs-title.class_.inherited__,
html body gradio-app .gradio-container .hljs-title.function_ {
color: #d2a8ff
}
html body gradio-app .gradio-container .hljs-attr,
html body gradio-app .gradio-container .hljs-attribute,
html body gradio-app .gradio-container .hljs-literal,
html body gradio-app .gradio-container .hljs-meta,
html body gradio-app .gradio-container .hljs-number,
html body gradio-app .gradio-container .hljs-operator,
html body gradio-app .gradio-container .hljs-selector-attr,
html body gradio-app .gradio-container .hljs-selector-class,
html body gradio-app .gradio-container .hljs-selector-id,
html body gradio-app .gradio-container .hljs-variable {
color: #79c0ff
}
html body gradio-app .gradio-container .hljs-meta .hljs-string,
html body gradio-app .gradio-container .hljs-regexp,
html body gradio-app .gradio-container .hljs-string {
color: #a5d6ff
}
html body gradio-app .gradio-container .hljs-built_in,
html body gradio-app .gradio-container .hljs-symbol {
color: #ffa657
}
html body gradio-app .gradio-container .hljs-code,
html body gradio-app .gradio-container .hljs-comment,
html body gradio-app .gradio-container .hljs-formula {
color: #8b949e
}
html body gradio-app .gradio-container .hljs-name,
html body gradio-app .gradio-container .hljs-quote,
html body gradio-app .gradio-container .hljs-selector-pseudo,
html body gradio-app .gradio-container .hljs-selector-tag {
color: #7ee787
}
html body gradio-app .gradio-container .hljs-subst {
color: #c9d1d9
}
html body gradio-app .gradio-container .hljs-section {
color: #1f6feb;
font-weight: 700
}
html body gradio-app .gradio-container .hljs-bullet {
color: #f2cc60
}
html body gradio-app .gradio-container .hljs-emphasis {
color: #c9d1d9;
font-style: italic
}
html body gradio-app .gradio-container .hljs-strong {
color: #c9d1d9;
font-weight: 700
}
html body gradio-app .gradio-container .hljs-addition {
color: #aff5b4;
background-color: #033a16
}
html body gradio-app .gradio-container .hljs-deletion {
color: #ffdcd7;
background-color: #67060c
}

111
css/highlightjs/github.min.css vendored Normal file
View file

@ -0,0 +1,111 @@
html body gradio-app .gradio-container pre code.hljs {
display: block;
overflow-x: auto;
padding: 1em
}
html body gradio-app .gradio-container code.hljs {
padding: 3px 5px
}
/*!
Theme: GitHub
Description: Light theme as seen on github.com
Author: github.com
Maintainer: @Hirse
Updated: 2021-05-15
Outdated base version: https://github.com/primer/github-syntax-light
Current colors taken from GitHub's CSS
*/
html body gradio-app .gradio-container .hljs {
color: #24292e;
background: #fff
}
html body gradio-app .gradio-container .hljs-doctag,
html body gradio-app .gradio-container .hljs-keyword,
html body gradio-app .gradio-container .hljs-meta .hljs-keyword,
html body gradio-app .gradio-container .hljs-template-tag,
html body gradio-app .gradio-container .hljs-template-variable,
html body gradio-app .gradio-container .hljs-type,
html body gradio-app .gradio-container .hljs-variable.language_ {
color: #d73a49
}
html body gradio-app .gradio-container .hljs-title,
html body gradio-app .gradio-container .hljs-title.class_,
html body gradio-app .gradio-container .hljs-title.class_.inherited__,
html body gradio-app .gradio-container .hljs-title.function_ {
color: #6f42c1
}
html body gradio-app .gradio-container .hljs-attr,
html body gradio-app .gradio-container .hljs-attribute,
html body gradio-app .gradio-container .hljs-literal,
html body gradio-app .gradio-container .hljs-meta,
html body gradio-app .gradio-container .hljs-number,
html body gradio-app .gradio-container .hljs-operator,
html body gradio-app .gradio-container .hljs-selector-attr,
html body gradio-app .gradio-container .hljs-selector-class,
html body gradio-app .gradio-container .hljs-selector-id,
html body gradio-app .gradio-container .hljs-variable {
color: #005cc5
}
html body gradio-app .gradio-container .hljs-meta .hljs-string,
html body gradio-app .gradio-container .hljs-regexp,
html body gradio-app .gradio-container .hljs-string {
color: #032f62
}
html body gradio-app .gradio-container .hljs-built_in,
html body gradio-app .gradio-container .hljs-symbol {
color: #e36209
}
html body gradio-app .gradio-container .hljs-code,
html body gradio-app .gradio-container .hljs-comment,
html body gradio-app .gradio-container .hljs-formula {
color: #6a737d
}
html body gradio-app .gradio-container .hljs-name,
html body gradio-app .gradio-container .hljs-quote,
html body gradio-app .gradio-container .hljs-selector-pseudo,
html body gradio-app .gradio-container .hljs-selector-tag {
color: #22863a
}
html body gradio-app .gradio-container .hljs-subst {
color: #24292e
}
html body gradio-app .gradio-container .hljs-section {
color: #005cc5;
font-weight: 700
}
html body gradio-app .gradio-container .hljs-bullet {
color: #735c0f
}
html body gradio-app .gradio-container .hljs-emphasis {
color: #24292e;
font-style: italic
}
html body gradio-app .gradio-container .hljs-strong {
color: #24292e;
font-weight: 700
}
html body gradio-app .gradio-container .hljs-addition {
color: #22863a;
background-color: #f0fff4
}
html body gradio-app .gradio-container .hljs-deletion {
color: #b31d28;
background-color: #ffeef0
}

View file

@ -0,0 +1 @@
.hljs-copy-wrapper{position:relative;overflow:hidden}.hljs-copy-wrapper:hover .hljs-copy-button,.hljs-copy-button:focus{transform:translateX(0)}.hljs-copy-button{position:absolute;transform:translateX(calc(100% + 1.125em));top:1em;right:1em;width:2rem;height:2rem;text-indent:-9999px;color:#fff;border-radius:.25rem;border:1px solid #ffffff22;background-color:#2d2b57;background-color:var(--hljs-theme-background);background-image:url('data:image/svg+xml;utf-8,<svg width="16" height="16" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path fill-rule="evenodd" clip-rule="evenodd" d="M6 5C5.73478 5 5.48043 5.10536 5.29289 5.29289C5.10536 5.48043 5 5.73478 5 6V20C5 20.2652 5.10536 20.5196 5.29289 20.7071C5.48043 20.8946 5.73478 21 6 21H18C18.2652 21 18.5196 20.8946 18.7071 20.7071C18.8946 20.5196 19 20.2652 19 20V6C19 5.73478 18.8946 5.48043 18.7071 5.29289C18.5196 5.10536 18.2652 5 18 5H16C15.4477 5 15 4.55228 15 4C15 3.44772 15.4477 3 16 3H18C18.7956 3 19.5587 3.31607 20.1213 3.87868C20.6839 4.44129 21 5.20435 21 6V20C21 20.7957 20.6839 21.5587 20.1213 22.1213C19.5587 22.6839 18.7957 23 18 23H6C5.20435 23 4.44129 22.6839 3.87868 22.1213C3.31607 21.5587 3 20.7957 3 20V6C3 5.20435 3.31607 4.44129 3.87868 3.87868C4.44129 3.31607 5.20435 3 6 3H8C8.55228 3 9 3.44772 9 4C9 4.55228 8.55228 5 8 5H6Z" fill="white"/><path fill-rule="evenodd" clip-rule="evenodd" d="M7 3C7 1.89543 7.89543 1 9 1H15C16.1046 1 17 1.89543 17 3V5C17 6.10457 16.1046 7 15 7H9C7.89543 7 7 6.10457 7 5V3ZM15 3H9V5H15V3Z" fill="white"/></svg>');background-repeat:no-repeat;background-position:center;transition:background-color 200ms ease,transform 200ms ease-out}.hljs-copy-button:hover{border-color:#ffffff44}.hljs-copy-button:active{border-color:#ffffff66}.hljs-copy-button[data-copied="true"]{text-indent:0;width:auto;background-image:none}@media(prefers-reduced-motion){.hljs-copy-button{transition:none}}.hljs-copy-alert{clip:rect(0 0 0 0);clip-path:inset(50%);height:1px;overflow:hidden;position:absolute;white-space:nowrap;width:1px}

View file

@ -1,104 +0,0 @@
#parent #container {
background-color: #eef2ff;
padding: 17px;
}
#parent #container .reply {
background-color: rgb(214, 218, 240);
border-bottom-color: rgb(183, 197, 217);
border-bottom-style: solid;
border-bottom-width: 1px;
border-image-outset: 0;
border-image-repeat: stretch;
border-image-slice: 100%;
border-image-source: none;
border-image-width: 1;
border-left-color: rgb(0, 0, 0);
border-left-style: none;
border-left-width: 0px;
border-right-color: rgb(183, 197, 217);
border-right-style: solid;
border-right-width: 1px;
border-top-color: rgb(0, 0, 0);
border-top-style: none;
border-top-width: 0px;
color: rgb(0, 0, 0);
display: table;
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
margin-bottom: 4px;
margin-left: 0px;
margin-right: 0px;
margin-top: 4px;
overflow-x: hidden;
overflow-y: hidden;
padding-bottom: 4px;
padding-left: 2px;
padding-right: 2px;
padding-top: 4px;
}
#parent #container .number {
color: rgb(0, 0, 0);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
width: 342.65px;
margin-right: 7px;
}
#parent #container .op {
color: rgb(0, 0, 0);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
margin-bottom: 8px;
margin-left: 0px;
margin-right: 0px;
margin-top: 4px;
overflow-x: hidden;
overflow-y: hidden;
}
#parent #container .op blockquote {
margin-left: 0px !important;
}
#parent #container .name {
color: rgb(17, 119, 67);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
font-weight: 700;
margin-left: 7px;
}
#parent #container .quote {
color: rgb(221, 0, 0);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
text-decoration-color: rgb(221, 0, 0);
text-decoration-line: underline;
text-decoration-style: solid;
text-decoration-thickness: auto;
}
#parent #container .greentext {
color: rgb(120, 153, 34);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
}
#parent #container blockquote {
margin: 0px !important;
margin-block-start: 1em;
margin-block-end: 1em;
margin-inline-start: 40px;
margin-inline-end: 40px;
margin-top: 13.33px !important;
margin-bottom: 13.33px !important;
margin-left: 40px !important;
margin-right: 40px !important;
}
#parent #container .message_4chan {
color: black;
border: none;
}

View file

@ -1,64 +1,97 @@
.message { .chat {
display: grid; background: transparent;
grid-template-columns: 60px 1fr; padding: 0;
padding-bottom: 25px; padding-top: 0;
font-size: 15px; }
font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 22px; .chat > .messages:first-child {
padding-top: 0 !important;
}
.chat .message-body p, .chat .message-body li {
font-size: 1rem !important;
line-height: 28px !important;
}
.dark .chat .message-body :is(p,li,h1,h2,h3,h4,h5,h6),
.dark .chat .message-body em:not(:is(h1,h2,h3,h4,h5,h6,b,strong) em),
.dark .chat .message-body q:not(:is(h1,h2,h3,h4,h5,h6,b,strong) q) {
color: #d1d5db !important;
}
.chat .message-body :is(th, td),
.prose hr {
border-color: #40404096 !important;
}
.dark .chat .message-body :is(th, td),
.dark .prose hr {
border-color: rgb(255 255 255 / 30%) !important;
}
.chat .message-body :is(p, ul, ol) {
margin: 1.25em 0 !important;
}
.chat .message-body :is(p, ul, ol):first-child {
margin-top: 0 !important;
}
.chat .message-body :is(p, ul, ol):last-child {
margin-bottom: 0 !important;
}
.user-message, .assistant-message {
font-family: Inter, Helvetica, Arial, sans-serif;
}
.message:first-child {
padding-top: 0;
} }
.username { .username {
display: none; display: none;
} }
.message-body p { .chat .user-message {
font-size: 15px !important; background: #f3f4f6;
line-height: 22px !important; padding: 1.5rem 1rem;
margin-bottom: 1.25em !important; padding-bottom: 2rem;
border-radius: 0;
border-bottom-right-radius: 0;
} }
.chat .message-body ul, .chat .message-body ol { .chat .assistant-message {
margin-bottom: 1.25em !important; padding: 1.5rem 1rem;
} padding-bottom: 2rem;
border-radius: 0;
.dark .message-body p em { border: 0;
color: rgb(198, 202, 214) !important;
}
.message-body p em {
color: rgb(110, 110, 110) !important;
}
.gradio-container .chat .assistant-message {
padding: 15px;
border-radius: 20px;
background-color: #0000000f;
margin-top: 9px !important;
margin-bottom: 18px !important;
}
.gradio-container .chat .user-message {
padding: 15px;
border-radius: 20px;
margin-bottom: 9px !important;
}
.gradio-container .chat .assistant-message:last-child, .gradio-container .chat .user-message:last-child {
margin-bottom: 0px !important;
}
.dark .chat .assistant-message {
background-color: #1f2937;
} }
.dark .chat .user-message { .dark .chat .user-message {
background-color: transparent; background: var(--light-gray);
} }
code { .dark .chat .assistant-message {
background-color: white !important; background: transparent;
} }
.dark code { .chat .user-message .text,
background-color: #0e1321 !important; .chat .assistant-message .text {
max-width: 724px;
margin-left: auto;
margin-right: auto;
}
/* Create space between two assistant messages in a row */
.assistant-message + .assistant-message {
margin-top: 1.5rem;
}
pre > code {
background-color: #f3f4f6 !important;
}
.dark pre > code {
background-color: #1f2937 !important;
} }

View file

@ -1,33 +1,33 @@
.container { .readable-container {
max-width: 600px; max-width: 600px;
margin-left: auto; margin-left: auto;
margin-right: auto; margin-right: auto;
background-color: rgb(31, 41, 55); background-color: rgb(31 41 55);
padding: 3em; padding: 3em;
word-break: break-word; word-break: break-word;
overflow-wrap: anywhere; overflow-wrap: anywhere;
color: #efefef !important; color: #efefef !important;
} }
.container p, .container li { .readable-container p, .readable-container li {
font-size: 16px !important; font-size: 16px !important;
color: #efefef !important; color: #efefef !important;
margin-bottom: 22px; margin-bottom: 22px;
line-height: 1.4 !important; line-height: 1.4 !important;
} }
.container li > p { .readable-container li > p {
display: inline !important; display: inline !important;
} }
.container code { .readable-container code {
overflow-x: auto; overflow-x: auto;
} }
.container :not(pre) > code { .readable-container :not(pre) > code {
white-space: normal !important; white-space: normal !important;
} }
.container .hoverable { .readable-container .hoverable {
font-size: 14px; font-size: 14px;
} }

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

1
css/katex/katex.min.css vendored Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load diff

View file

@ -1,9 +1,3 @@
.env .env
Dockerfile Dockerfile
/characters /user_data
/loras
/models
/presets
/prompts
/softprompts
/training

Some files were not shown because too many files have changed in this diff Show more