Commit graph

155 commits

Author SHA1 Message Date
oobabooga bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
Philipp Emanuel Weidmann 852c943769
DRY: A modern repetition penalty that reliably prevents looping (#5677) 2024-05-19 23:53:47 -03:00
oobabooga e61055253c Bump llama-cpp-python to 0.2.69, add --flash-attn option 2024-05-03 04:31:22 -07:00
oobabooga 51fb766bea
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964) 2024-04-30 09:11:31 -03:00
oobabooga 9b623b8a78
Bump llama-cpp-python to 0.2.64, use official wheels (#5921) 2024-04-23 23:17:05 -03:00
oobabooga d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga 35da6b989d
Organize the parameters tab (#5767) 2024-03-28 16:45:03 -03:00
oobabooga afb51bd5d6
Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) (#5669) 2024-03-09 00:25:33 -03:00
oobabooga 2ec1d96c91
Add cache_4bit option for ExLlamaV2 (#5645) 2024-03-06 23:02:25 -03:00
kalomaze cfb25c9b3f
Cubic sampling w/ curve param (#5551)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-03-03 13:22:21 -03:00
oobabooga a6730f88f7
Add --autosplit flag for ExLlamaV2 (#5524) 2024-02-16 15:26:10 -03:00
oobabooga 2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga 8c35fefb3b
Add custom sampler order support (#5443) 2024-02-06 11:20:10 -03:00
oobabooga 9033fa5eee Organize the Model tab 2024-02-04 19:30:22 -08:00
Forkoz 2a45620c85
Split by rows instead of layers for llama.cpp multi-gpu (#5435) 2024-02-04 23:36:40 -03:00
oobabooga cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
kalomaze b6077b02e4
Quadratic sampling (#5403)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-02-04 00:20:02 -03:00
oobabooga 87dc421ee8
Bump exllamav2 to 0.0.12 (#5352) 2024-01-22 22:40:12 -03:00
oobabooga e055967974
Add prompt_lookup_num_tokens parameter (#5296) 2024-01-17 17:09:36 -03:00
oobabooga 372ef5e2d8 Fix dynatemp parameters always visible 2024-01-08 19:42:31 -08:00
oobabooga 29c2693ea0
dynatemp_low, dynatemp_high, dynatemp_exponent parameters (#5209) 2024-01-08 23:28:35 -03:00
oobabooga 0d07b3a6a1
Add dynamic_temperature_low parameter (#5198) 2024-01-07 17:03:47 -03:00
kalomaze 48327cc5c4
Dynamic Temperature HF loader support (#5174)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-01-07 10:36:26 -03:00
oobabooga 0e54a09bcb
Remove exllamav1 loaders (#5128) 2023-12-31 01:57:06 -03:00
oobabooga c727a70572 Remove redundancy from modules/loaders.py 2023-12-20 19:18:07 -08:00
oobabooga de138b8ba6
Add llama-cpp-python wheels with tensor cores support (#5003) 2023-12-19 17:30:53 -03:00
oobabooga 0a299d5959
Bump llama-cpp-python to 0.2.24 (#5001) 2023-12-19 15:22:21 -03:00
oobabooga f6d701624c UI: mention that QuIP# does not work on Windows 2023-12-18 18:05:02 -08:00
Water 674be9a09a
Add HQQ quant loader (#4888)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-12-18 21:23:16 -03:00
oobabooga f1f2c4c3f4
Add --num_experts_per_token parameter (ExLlamav2) (#4955) 2023-12-17 12:08:33 -03:00
oobabooga 3bbf6c601d AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this) 2023-12-15 06:46:13 -08:00
oobabooga 62d59a516f Add trust_remote_code to all HF loaders 2023-12-08 06:29:26 -08:00
oobabooga 98361af4d5
Add QuIP# support (#4803)
It has to be installed manually for now.
2023-12-06 00:01:01 -03:00
oobabooga 7fc9033b2e Recommend ExLlama_HF and ExLlamav2_HF 2023-12-04 15:28:46 -08:00
oobabooga 9940ed9c77 Sort the loaders 2023-11-29 15:13:03 -08:00
oobabooga a7670c31ca Sort 2023-11-28 18:43:33 -08:00
oobabooga 6e51bae2e0 Sort the loaders menu 2023-11-28 18:41:11 -08:00
oobabooga 0589ff5b12
Bump llama-cpp-python to 0.2.19 & add min_p and typical_p parameters to llama.cpp loader (#4701) 2023-11-21 20:59:39 -03:00
oobabooga e0ca49ed9c
Bump llama-cpp-python to 0.2.18 (2nd attempt) (#4637)
* Update requirements*.txt

* Add back seed
2023-11-18 00:31:27 -03:00
oobabooga 9d6f79db74 Revert "Bump llama-cpp-python to 0.2.18 (#4611)"
This reverts commit 923c8e25fb.
2023-11-17 05:14:25 -08:00
oobabooga 8b66d83aa9 Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
2023-11-16 19:55:28 -08:00
oobabooga 923c8e25fb
Bump llama-cpp-python to 0.2.18 (#4611) 2023-11-16 22:55:14 -03:00
oobabooga 58c6001be9 Add missing exllamav2 samplers 2023-11-16 07:09:40 -08:00
oobabooga af3d25a503 Disable logits_all in llamacpp_HF (makes processing 3x faster) 2023-11-07 14:35:48 -08:00
feng lui 4766a57352
transformers: add use_flash_attention_2 option (#4373) 2023-11-04 13:59:33 -03:00
oobabooga aa5d671579
Add temperature_last parameter (#4472) 2023-11-04 13:09:07 -03:00
kalomaze 367e5e6e43
Implement Min P as a sampler option in HF loaders (#4449) 2023-11-02 16:32:51 -03:00
oobabooga c0655475ae Add cache_8bit option 2023-11-02 11:23:04 -07:00
tdrussell 72f6fc6923
Rename additive_repetition_penalty to presence_penalty, add frequency_penalty (#4376) 2023-10-25 12:10:28 -03:00
oobabooga ef1489cd4d Remove unused parameter in AutoAWQ 2023-10-23 20:45:43 -07:00
tdrussell 4440f87722
Add additive_repetition_penalty sampler setting. (#3627) 2023-10-23 02:28:07 -03:00
oobabooga df90d03e0b Replace --mul_mat_q with --no_mul_mat_q 2023-10-22 12:23:03 -07:00
Johan 1d5a015ce7
Enable special token support for exllamav2 (#4314) 2023-10-21 01:54:06 -03:00
turboderp 8a98646a21
Bump ExLlamaV2 to 0.0.5 (#4186) 2023-10-05 19:12:22 -03:00
cal066 cc632c3f33
AutoAWQ: initial support (#3999) 2023-10-05 13:19:18 -03:00
oobabooga ae4ba3007f
Add grammar to transformers and _HF loaders (#4091) 2023-10-05 10:01:36 -03:00
oobabooga b6fe6acf88 Add threads_batch parameter 2023-10-01 21:28:00 -07:00
jllllll 41a2de96e5
Bump llama-cpp-python to 0.2.11 2023-10-01 18:08:10 -05:00
oobabooga 56b5a4af74 exllamav2 typical_p 2023-09-28 20:10:12 -07:00
StoyanStAtanasov 7e6ff8d1f0
Enable NUMA feature for llama_cpp_python (#4040) 2023-09-26 22:05:00 -03:00
oobabooga d0d221df49 Add --use_fast option (closes #3741) 2023-09-25 12:19:43 -07:00
oobabooga 36c38d7561 Add disable_exllama to Transformers loader (for GPTQ LoRA training) 2023-09-24 20:03:11 -07:00
oobabooga 08cf150c0c
Add a grammar editor to the UI (#4061) 2023-09-24 18:05:24 -03:00
oobabooga eb0b7c1053 Fix a minor UI bug 2023-09-24 07:17:33 -07:00
oobabooga b227e65d86 Add grammar to llama.cpp loader (closes #4019) 2023-09-24 07:10:45 -07:00
saltacc f01b9aa71f
Add customizable ban tokens (#3899) 2023-09-15 18:27:27 -03:00
oobabooga 2f935547c8 Minor changes 2023-09-12 15:05:21 -07:00
oobabooga 18e6b275f3 Add alpha_value/compress_pos_emb to ExLlama-v2 2023-09-12 15:02:47 -07:00
oobabooga c2a309f56e
Add ExLlamaV2 and ExLlamav2_HF loaders (#3881) 2023-09-12 14:33:07 -03:00
oobabooga ed86878f02 Remove GGML support 2023-09-11 07:44:00 -07:00
Ravindra Marella e4c3e1bdd2
Fix ctransformers model unload (#3711)
Add missing comma in model types list

Fixes marella/ctransformers#111
2023-08-27 10:53:48 -03:00
oobabooga 52ab2a6b9e Add rope_freq_base parameter for CodeLlama 2023-08-25 06:55:15 -07:00
oobabooga 3320accfdc
Add CFG to llamacpp_HF (second attempt) (#3678) 2023-08-24 20:32:21 -03:00
oobabooga d6934bc7bc
Implement CFG for ExLlama_HF (#3666) 2023-08-24 16:27:36 -03:00
cal066 e042bf8624
ctransformers: add mlock and no-mmap options (#3649) 2023-08-22 16:51:34 -03:00
oobabooga 7cba000421
Bump llama-cpp-python, +tensor_split by @shouyiwang, +mul_mat_q (#3610) 2023-08-18 12:03:34 -03:00
cal066 991bb57e43
ctransformers: Fix up model_type name consistency (#3567) 2023-08-14 15:17:24 -03:00
oobabooga ccfc02a28d
Add the --disable_exllama option for AutoGPTQ (#3545 from clefever/disable-exllama) 2023-08-14 15:15:55 -03:00
Eve 66c04c304d
Various ctransformers fixes (#3556)
---------

Co-authored-by: cal066 <cal066@users.noreply.github.com>
2023-08-13 23:09:03 -03:00
cal066 bf70c19603
ctransformers: move thread and seed parameters (#3543) 2023-08-13 00:04:03 -03:00
Chris Lefever 0230fa4e9c Add the --disable_exllama option for AutoGPTQ 2023-08-12 02:26:58 -04:00
oobabooga 2f918ccf7c Remove unused parameter 2023-08-11 11:15:22 -07:00
oobabooga 28c8df337b Add repetition_penalty_range to ctransformers 2023-08-11 11:04:19 -07:00
cal066 7a4fcee069
Add ctransformers support (#3313)
---------

Co-authored-by: cal066 <cal066@users.noreply.github.com>
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
Co-authored-by: randoentity <137087500+randoentity@users.noreply.github.com>
2023-08-11 14:41:33 -03:00
oobabooga d8fb506aff Add RoPE scaling support for transformers (including dynamic NTK)
https://github.com/huggingface/transformers/pull/24653
2023-08-08 21:25:48 -07:00
oobabooga 0af10ab49b
Add Classifier Free Guidance (CFG) for Transformers/ExLlama (#3325) 2023-08-06 17:22:48 -03:00
oobabooga 87dab03dc0
Add the --cpu option for llama.cpp to prevent CUDA from being used (#3432) 2023-08-03 11:00:36 -03:00
oobabooga 32a2bbee4a Implement auto_max_new_tokens for ExLlama 2023-08-02 11:03:56 -07:00
oobabooga e931844fe2
Add auto_max_new_tokens parameter (#3419) 2023-08-02 14:52:20 -03:00
oobabooga 84297d05c4 Add a "Filter by loader" menu to the Parameters tab 2023-07-31 19:09:02 -07:00
oobabooga b17893a58f Revert "Add tensor split support for llama.cpp (#3171)"
This reverts commit 031fe7225e.
2023-07-26 07:06:01 -07:00
Shouyi 031fe7225e
Add tensor split support for llama.cpp (#3171) 2023-07-25 18:59:26 -03:00
oobabooga a07d070b6c
Add llama-2-70b GGML support (#3285) 2023-07-24 16:37:03 -03:00
randoentity a69955377a
[GGML] Support for customizable RoPE (#3083)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-07-17 22:32:37 -03:00
oobabooga 5e3f7e00a9
Create llamacpp_HF loader (#3062) 2023-07-16 02:21:13 -03:00
oobabooga e202190c4f lint 2023-07-12 11:33:25 -07:00
Gabriel Pena eedb3bf023
Add low vram mode on llama cpp (#3076) 2023-07-12 11:05:13 -03:00
Panchovix 10c8c197bf
Add Support for Static NTK RoPE scaling for exllama/exllama_hf (#2955) 2023-07-04 01:13:16 -03:00
oobabooga c52290de50
ExLlama with long context (#2875) 2023-06-25 22:49:26 -03:00
oobabooga 3ae9af01aa Add --no_use_cuda_fp16 param for AutoGPTQ 2023-06-23 12:22:56 -03:00