Commit graph

367 commits

Author SHA1 Message Date
oobabooga 8b8d39ec4e
Add ExLlamaV3 support (#6832) 2025-04-09 00:07:08 -03:00
oobabooga a5855c345c
Set context lengths to at most 8192 by default (to prevent out of memory errors) (#6835) 2025-04-07 21:42:33 -03:00
oobabooga 109de34e3b Remove the old --model-menu flag 2025-03-31 09:24:03 -07:00
oobabooga 0360f54ae8 UI: add a "Show after" parameter (to use with DeepSeek </think>) 2025-02-02 15:30:09 -08:00
oobabooga c832953ff7 UI: Activate auto_max_new_tokens by default 2025-01-14 05:59:55 -08:00
oobabooga d2f6c0f65f Update README 2025-01-10 13:25:40 -08:00
oobabooga c393f7650d Update settings-template.yaml, organize modules/shared.py 2025-01-10 13:22:18 -08:00
oobabooga 83c426e96b
Organize internals (#6646) 2025-01-10 18:04:32 -03:00
oobabooga 7fe46764fb Improve the --help message about --tensorcores as well 2025-01-10 07:07:41 -08:00
oobabooga da6d868f58 Remove old deprecated flags (~6 months or more) 2025-01-09 16:11:46 -08:00
BPplays 619265b32c
add ipv6 support to the API (#6559) 2025-01-09 10:23:44 -03:00
oobabooga 91a8a87887 Remove obsolete code 2025-01-08 15:07:21 -08:00
oobabooga 7157257c3f
Remove the AutoGPTQ loader (#6641) 2025-01-08 19:28:56 -03:00
oobabooga c0f600c887 Add a --torch-compile flag for transformers 2025-01-05 05:47:00 -08:00
oobabooga 11af199aff Add a "Static KV cache" option for transformers 2025-01-04 17:52:57 -08:00
oobabooga 60c93e0c66 UI: Set cache_type to fp16 by default 2024-12-17 19:44:20 -08:00
Diner Burger addad3c63e
Allow more granular KV cache settings (#6561) 2024-12-17 17:43:48 -03:00
oobabooga d769618591
Improved UI (#6575) 2024-12-17 00:47:41 -03:00
RandoInternetPreson 46996f6519
ExllamaV2 tensor parallelism to increase multi gpu inference speeds (#6356) 2024-09-28 00:26:03 -03:00
oobabooga e926c03b3d Add a --tokenizer-dir command-line flag for llamacpp_HF 2024-08-06 19:41:18 -07:00
oobabooga 9dcff21da9 Remove unnecessary shared.previous_model_name variable 2024-07-28 18:35:11 -07:00
oobabooga 7050bb880e UI: make n_ctx/max_seq_len/truncation_length numbers rather than sliders 2024-07-27 23:11:53 -07:00
oobabooga e6181e834a Remove AutoAWQ as a standalone loader
(it works better through transformers)
2024-07-23 15:31:17 -07:00
oobabooga f18c947a86 Update the tensorcores description 2024-07-22 18:06:41 -07:00
oobabooga aa809e420e Bump llama-cpp-python to 0.2.83, add back tensorcore wheels
Also add back the progress bar patch
2024-07-22 18:05:11 -07:00
oobabooga 11bbf71aa5
Bump back llama-cpp-python (#6257) 2024-07-22 16:19:41 -03:00
oobabooga 0f53a736c1 Revert the llama-cpp-python update 2024-07-22 12:02:25 -07:00
oobabooga a687f950ba Remove the tensorcores llama.cpp wheels
They are not faster than the default wheels anymore and they use a lot of space.
2024-07-22 11:54:35 -07:00
oobabooga e9d4bff7d0 Update the --tensor_split description 2024-07-20 22:04:48 -07:00
Alberto Cano a14c510afb
Customize the subpath for gradio, use with reverse proxy (#5106) 2024-07-20 19:10:39 -03:00
oobabooga aa7c14a463 Use chat-instruct mode by default 2024-07-19 21:43:52 -07:00
oobabooga e436d69e2b Add --no_xformers and --no_sdpa flags for ExllamaV2 2024-07-11 15:47:37 -07:00
GralchemOz 8a39f579d8
transformers: Add eager attention option to make Gemma-2 work properly (#6188) 2024-07-01 12:08:08 -03:00
oobabooga 577a8cd3ee
Add TensorRT-LLM support (#5715) 2024-06-24 02:30:03 -03:00
oobabooga bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
oobabooga 9f77ed1b98
--idle-timeout flag to unload the model if unused for N minutes (#6026) 2024-05-19 23:29:39 -03:00
oobabooga e61055253c Bump llama-cpp-python to 0.2.69, add --flash-attn option 2024-05-03 04:31:22 -07:00
oobabooga 51fb766bea
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964) 2024-04-30 09:11:31 -03:00
oobabooga 70845c76fb
Add back the max_updates_second parameter (#5937) 2024-04-26 10:14:51 -03:00
oobabooga 9b623b8a78
Bump llama-cpp-python to 0.2.64, use official wheels (#5921) 2024-04-23 23:17:05 -03:00
oobabooga cbd65ba767
Add a simple min_p preset, make it the default (#5836) 2024-04-09 12:50:16 -03:00
oobabooga 168a0f4f67 UI: do not load the "gallery" extension by default 2024-04-06 12:43:21 -07:00
oobabooga d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga 2a92a842ce
Bump gradio to 4.23 (#5758) 2024-03-26 16:32:20 -03:00
oobabooga 28076928ac
UI: Add a new "User description" field for user personality/biography (#5691) 2024-03-11 23:41:57 -03:00
oobabooga 056717923f Document StreamingLLM 2024-03-10 19:15:23 -07:00
oobabooga afb51bd5d6
Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) (#5669) 2024-03-09 00:25:33 -03:00
Bartowski 104573f7d4
Update cache_4bit documentation (#5649)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-03-07 13:08:21 -03:00
oobabooga 2ec1d96c91
Add cache_4bit option for ExLlamaV2 (#5645) 2024-03-06 23:02:25 -03:00
oobabooga 2174958362
Revert gradio to 3.50.2 (#5640) 2024-03-06 11:52:46 -03:00
oobabooga 63a1d4afc8
Bump gradio to 4.19 (#5522) 2024-03-05 07:32:28 -03:00
oobabooga a6730f88f7
Add --autosplit flag for ExLlamaV2 (#5524) 2024-02-16 15:26:10 -03:00
oobabooga 76d28eaa9e
Add a menu for customizing the instruction template for the model (#5521) 2024-02-16 14:21:17 -03:00
oobabooga 080f7132c0
Revert gradio to 3.50.2 (#5513) 2024-02-15 20:40:23 -03:00
oobabooga 7123ac3f77
Remove "Maximum UI updates/second" parameter (#5507) 2024-02-14 23:34:30 -03:00
oobabooga acfbe6b3b3 Minor doc changes 2024-02-06 06:35:01 -08:00
oobabooga 8a6d9abb41 Small fixes 2024-02-06 06:26:27 -08:00
oobabooga 2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga 8c35fefb3b
Add custom sampler order support (#5443) 2024-02-06 11:20:10 -03:00
Forkoz 2a45620c85
Split by rows instead of layers for llama.cpp multi-gpu (#5435) 2024-02-04 23:36:40 -03:00
oobabooga cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
oobabooga e055967974
Add prompt_lookup_num_tokens parameter (#5296) 2024-01-17 17:09:36 -03:00
oobabooga 53dc1d8197 UI: Do not save unchanged settings to settings.yaml 2024-01-09 18:59:04 -08:00
oobabooga 2aad91f3c9
Remove deprecated command-line flags (#5131) 2023-12-31 02:07:48 -03:00
oobabooga 2734ce3e4c
Remove RWKV loader (#5130) 2023-12-31 02:01:40 -03:00
oobabooga 0e54a09bcb
Remove exllamav1 loaders (#5128) 2023-12-31 01:57:06 -03:00
oobabooga 8e397915c9
Remove --sdp-attention, --xformers flags (#5126) 2023-12-31 01:36:51 -03:00
oobabooga 8c60495878 UI: add "Maximum UI updates/second" parameter 2023-12-24 09:17:40 -08:00
oobabooga 2706149c65
Organize the CMD arguments by group (#5027) 2023-12-21 00:33:55 -03:00
oobabooga 9992f7d8c0 Improve several log messages 2023-12-19 20:54:32 -08:00
oobabooga de138b8ba6
Add llama-cpp-python wheels with tensor cores support (#5003) 2023-12-19 17:30:53 -03:00
oobabooga 0a299d5959
Bump llama-cpp-python to 0.2.24 (#5001) 2023-12-19 15:22:21 -03:00
oobabooga a23a004434 Update the example template 2023-12-18 17:47:35 -08:00
Water 674be9a09a
Add HQQ quant loader (#4888)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-12-18 21:23:16 -03:00
oobabooga f1f2c4c3f4
Add --num_experts_per_token parameter (ExLlamav2) (#4955) 2023-12-17 12:08:33 -03:00
oobabooga 3bbf6c601d AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this) 2023-12-15 06:46:13 -08:00
oobabooga 1c531a3713 Minor cleanup 2023-12-12 13:25:21 -08:00
oobabooga 39d2fe1ed9
Jinja templates for Instruct and Chat (#4874) 2023-12-12 17:23:14 -03:00
oobabooga 8c8825b777 Add QuIP# to README 2023-12-08 08:40:42 -08:00
oobabooga 2c5a1e67f9
Parameters: change max_new_tokens & repetition_penalty_range defaults (#4842) 2023-12-07 20:04:52 -03:00
oobabooga 98361af4d5
Add QuIP# support (#4803)
It has to be installed manually for now.
2023-12-06 00:01:01 -03:00
oobabooga 131a5212ce UI: update context upper limit to 200000 2023-12-04 15:48:34 -08:00
oobabooga be88b072e9 Update --loader flag description 2023-12-04 15:41:25 -08:00
Lounger 7c0a17962d
Gallery improvements (#4789) 2023-12-03 22:45:50 -03:00
oobabooga 2769a1fa25 Hide deprecated args from Session tab 2023-11-21 15:15:16 -08:00
oobabooga ef6feedeb2
Add --nowebui flag for pure API mode (#4651) 2023-11-18 23:38:39 -03:00
oobabooga 8f4f4daf8b
Add --admin-key flag for API (#4649) 2023-11-18 22:33:27 -03:00
oobabooga e0ca49ed9c
Bump llama-cpp-python to 0.2.18 (2nd attempt) (#4637)
* Update requirements*.txt

* Add back seed
2023-11-18 00:31:27 -03:00
oobabooga 9d6f79db74 Revert "Bump llama-cpp-python to 0.2.18 (#4611)"
This reverts commit 923c8e25fb.
2023-11-17 05:14:25 -08:00
oobabooga 13dc3b61da Update README 2023-11-16 19:57:55 -08:00
oobabooga 8b66d83aa9 Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
2023-11-16 19:55:28 -08:00
oobabooga 923c8e25fb
Bump llama-cpp-python to 0.2.18 (#4611) 2023-11-16 22:55:14 -03:00
oobabooga 4aabff3728 Remove old API, launch OpenAI API with --api 2023-11-10 06:39:08 -08:00
oobabooga 6e2e0317af
Separate context and system message in instruction formats (#4499) 2023-11-07 20:02:58 -03:00
oobabooga af3d25a503 Disable logits_all in llamacpp_HF (makes processing 3x faster) 2023-11-07 14:35:48 -08:00
oobabooga ec17a5d2b7
Make OpenAI API the default API (#4430) 2023-11-06 02:38:29 -03:00
feng lui 4766a57352
transformers: add use_flash_attention_2 option (#4373) 2023-11-04 13:59:33 -03:00
Julien Chaumond fdcaa955e3
transformers: Add a flag to force load from safetensors (#4450) 2023-11-02 16:20:54 -03:00
oobabooga c0655475ae Add cache_8bit option 2023-11-02 11:23:04 -07:00
oobabooga 77abd9b69b Add no_flash_attn option 2023-11-02 11:08:53 -07:00