Commit graph

128 commits

Author SHA1 Message Date
oobabooga 0ef1b8f8b4 Use ExLlamaV2 (instead of the HF one) for EXL2 models for now
It doesn't seem to have the "OverflowError" bug
2025-04-17 05:47:40 -07:00
oobabooga 8b8d39ec4e
Add ExLlamaV3 support (#6832) 2025-04-09 00:07:08 -03:00
oobabooga 5bcd2d7ad0
Add the top N-sigma sampler (#6796) 2025-03-14 16:45:11 -03:00
oobabooga 83c426e96b
Organize internals (#6646) 2025-01-10 18:04:32 -03:00
oobabooga 7157257c3f
Remove the AutoGPTQ loader (#6641) 2025-01-08 19:28:56 -03:00
oobabooga c0f600c887 Add a --torch-compile flag for transformers 2025-01-05 05:47:00 -08:00
oobabooga 11af199aff Add a "Static KV cache" option for transformers 2025-01-04 17:52:57 -08:00
oobabooga 3967520e71 Connect XTC, DRY, smoothing_factor, and dynatemp to ExLlamaV2 loader (non-HF) 2025-01-04 16:25:06 -08:00
oobabooga 39a5c9a49c
UI organization (#6618) 2024-12-29 11:16:17 -03:00
Diner Burger addad3c63e
Allow more granular KV cache settings (#6561) 2024-12-17 17:43:48 -03:00
oobabooga 93c250b9b6 Add a UI element for enable_tp 2024-10-01 11:16:15 -07:00
Philipp Emanuel Weidmann 301375834e
Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition (#6335) 2024-09-27 22:50:12 -03:00
oobabooga e6181e834a Remove AutoAWQ as a standalone loader
(it works better through transformers)
2024-07-23 15:31:17 -07:00
oobabooga aa809e420e Bump llama-cpp-python to 0.2.83, add back tensorcore wheels
Also add back the progress bar patch
2024-07-22 18:05:11 -07:00
oobabooga 11bbf71aa5
Bump back llama-cpp-python (#6257) 2024-07-22 16:19:41 -03:00
oobabooga 0f53a736c1 Revert the llama-cpp-python update 2024-07-22 12:02:25 -07:00
oobabooga a687f950ba Remove the tensorcores llama.cpp wheels
They are not faster than the default wheels anymore and they use a lot of space.
2024-07-22 11:54:35 -07:00
oobabooga e436d69e2b Add --no_xformers and --no_sdpa flags for ExllamaV2 2024-07-11 15:47:37 -07:00
GralchemOz 8a39f579d8
transformers: Add eager attention option to make Gemma-2 work properly (#6188) 2024-07-01 12:08:08 -03:00
oobabooga 4ea260098f llama.cpp: add 4-bit/8-bit kv cache options 2024-06-29 09:10:33 -07:00
oobabooga 577a8cd3ee
Add TensorRT-LLM support (#5715) 2024-06-24 02:30:03 -03:00
oobabooga 536f8d58d4 Do not expose alpha_value to llama.cpp & rope_freq_base to transformers
To avoid confusion
2024-06-23 22:09:24 -07:00
Forkoz 1d79aa67cf
Fix flash-attn UI parameter to actually store true. (#6076) 2024-06-13 00:34:54 -03:00
oobabooga bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
Philipp Emanuel Weidmann 852c943769
DRY: A modern repetition penalty that reliably prevents looping (#5677) 2024-05-19 23:53:47 -03:00
oobabooga e61055253c Bump llama-cpp-python to 0.2.69, add --flash-attn option 2024-05-03 04:31:22 -07:00
oobabooga 51fb766bea
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964) 2024-04-30 09:11:31 -03:00
oobabooga 9b623b8a78
Bump llama-cpp-python to 0.2.64, use official wheels (#5921) 2024-04-23 23:17:05 -03:00
oobabooga d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga 35da6b989d
Organize the parameters tab (#5767) 2024-03-28 16:45:03 -03:00
oobabooga afb51bd5d6
Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) (#5669) 2024-03-09 00:25:33 -03:00
oobabooga 2ec1d96c91
Add cache_4bit option for ExLlamaV2 (#5645) 2024-03-06 23:02:25 -03:00
kalomaze cfb25c9b3f
Cubic sampling w/ curve param (#5551)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-03-03 13:22:21 -03:00
oobabooga a6730f88f7
Add --autosplit flag for ExLlamaV2 (#5524) 2024-02-16 15:26:10 -03:00
oobabooga 2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga 8c35fefb3b
Add custom sampler order support (#5443) 2024-02-06 11:20:10 -03:00
oobabooga 9033fa5eee Organize the Model tab 2024-02-04 19:30:22 -08:00
Forkoz 2a45620c85
Split by rows instead of layers for llama.cpp multi-gpu (#5435) 2024-02-04 23:36:40 -03:00
oobabooga cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
kalomaze b6077b02e4
Quadratic sampling (#5403)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-02-04 00:20:02 -03:00
oobabooga 87dc421ee8
Bump exllamav2 to 0.0.12 (#5352) 2024-01-22 22:40:12 -03:00
oobabooga e055967974
Add prompt_lookup_num_tokens parameter (#5296) 2024-01-17 17:09:36 -03:00
oobabooga 372ef5e2d8 Fix dynatemp parameters always visible 2024-01-08 19:42:31 -08:00
oobabooga 29c2693ea0
dynatemp_low, dynatemp_high, dynatemp_exponent parameters (#5209) 2024-01-08 23:28:35 -03:00
oobabooga 0d07b3a6a1
Add dynamic_temperature_low parameter (#5198) 2024-01-07 17:03:47 -03:00
kalomaze 48327cc5c4
Dynamic Temperature HF loader support (#5174)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-01-07 10:36:26 -03:00
oobabooga 0e54a09bcb
Remove exllamav1 loaders (#5128) 2023-12-31 01:57:06 -03:00
oobabooga c727a70572 Remove redundancy from modules/loaders.py 2023-12-20 19:18:07 -08:00
oobabooga de138b8ba6
Add llama-cpp-python wheels with tensor cores support (#5003) 2023-12-19 17:30:53 -03:00
oobabooga 0a299d5959
Bump llama-cpp-python to 0.2.24 (#5001) 2023-12-19 15:22:21 -03:00