Commit graph

234 commits

Author SHA1 Message Date
oobabooga 4039999be5 Autodetect llamacpp_HF loader when tokenizer exists 2024-02-16 09:29:26 -08:00
oobabooga b2b74c83a6 Fix Qwen1.5 in llamacpp_HF 2024-02-15 19:04:19 -08:00
oobabooga d47182d9d1
llamacpp_HF: do not use oobabooga/llama-tokenizer (#5499) 2024-02-14 00:28:51 -03:00
oobabooga 4e34ae0587 Minor logging improvements 2024-02-06 08:22:08 -08:00
oobabooga 8ee3cea7cb Improve some log messages 2024-02-06 06:31:27 -08:00
oobabooga 2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
sam-ngu c0bdcee646
added trust_remote_code to deepspeed init loaderClass (#5237) 2024-01-26 11:10:57 -03:00
oobabooga 89e7e107fc Lint 2024-01-09 16:27:50 -08:00
oobabooga 94afa0f9cf Minor style changes 2024-01-01 16:00:22 -08:00
oobabooga 2734ce3e4c
Remove RWKV loader (#5130) 2023-12-31 02:01:40 -03:00
oobabooga 0e54a09bcb
Remove exllamav1 loaders (#5128) 2023-12-31 01:57:06 -03:00
oobabooga 8e397915c9
Remove --sdp-attention, --xformers flags (#5126) 2023-12-31 01:36:51 -03:00
Yiximail afc91edcb2
Reset the model_name after unloading the model (#5051) 2023-12-22 22:18:24 -03:00
oobabooga f0f6d9bdf9 Add HQQ back & update version
This reverts commit 2289e9031e.
2023-12-20 07:46:09 -08:00
oobabooga fadb295d4d Lint 2023-12-19 21:36:57 -08:00
oobabooga fb8ee9f7ff Add a specific error if HQQ is missing 2023-12-19 21:32:58 -08:00
oobabooga 9992f7d8c0 Improve several log messages 2023-12-19 20:54:32 -08:00
Water 674be9a09a
Add HQQ quant loader (#4888)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-12-18 21:23:16 -03:00
oobabooga 3bbf6c601d AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this) 2023-12-15 06:46:13 -08:00
oobabooga 39d2fe1ed9
Jinja templates for Instruct and Chat (#4874) 2023-12-12 17:23:14 -03:00
oobabooga 2a335b8aa7 Cleanup: set shared.model_name only once 2023-12-08 06:35:23 -08:00
oobabooga 98361af4d5
Add QuIP# support (#4803)
It has to be installed manually for now.
2023-12-06 00:01:01 -03:00
oobabooga 77d6ccf12b Add a LOADER debug message while loading models 2023-11-30 12:00:32 -08:00
oobabooga 8b66d83aa9 Set use_fast=True by default, create --no_use_fast flag
This increases tokens/second for HF loaders.
2023-11-16 19:55:28 -08:00
oobabooga a85ce5f055 Add more info messages for truncation / instruction template 2023-11-15 16:20:31 -08:00
oobabooga 883701bc40 Alternative solution to 025da386a0
Fixes an error.
2023-11-15 16:04:02 -08:00
oobabooga 8ac942813c Revert "Fix CPU memory limit error (issue #3763) (#4597)"
This reverts commit 025da386a0.
2023-11-15 16:01:54 -08:00
oobabooga e6f44d6d19 Print context length / instruction template to terminal when loading models 2023-11-15 16:00:51 -08:00
Andy Bao 025da386a0
Fix CPU memory limit error (issue #3763) (#4597)
get_max_memory_dict() was not properly formatting shared.args.cpu_memory

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-11-15 20:27:20 -03:00
oobabooga 2358706453 Add /v1/internal/model/load endpoint (tentative) 2023-11-07 20:58:06 -08:00
oobabooga ec17a5d2b7
Make OpenAI API the default API (#4430) 2023-11-06 02:38:29 -03:00
feng lui 4766a57352
transformers: add use_flash_attention_2 option (#4373) 2023-11-04 13:59:33 -03:00
Julien Chaumond fdcaa955e3
transformers: Add a flag to force load from safetensors (#4450) 2023-11-02 16:20:54 -03:00
oobabooga 839a87bac8 Fix is_ccl_available & is_xpu_available imports 2023-10-26 20:27:04 -07:00
Abhilash Majumder 778a010df8
Intel Gpu support initialization (#4340) 2023-10-26 23:39:51 -03:00
oobabooga ef1489cd4d Remove unused parameter in AutoAWQ 2023-10-23 20:45:43 -07:00
oobabooga 8ea554bc19 Check for torch.xpu.is_available() 2023-10-16 12:53:40 -07:00
oobabooga b88b2b74a6 Experimental Intel Arc transformers support (untested) 2023-10-15 20:51:11 -07:00
oobabooga f63361568c Fix safetensors kwarg usage in AutoAWQ 2023-10-10 19:03:09 -07:00
oobabooga fae8062d39
Bump to latest gradio (3.47) (#4258) 2023-10-10 22:20:49 -03:00
cal066 cc632c3f33
AutoAWQ: initial support (#3999) 2023-10-05 13:19:18 -03:00
oobabooga 87ea2d96fd Add a note about RWKV loader 2023-09-26 17:43:39 -07:00
oobabooga d0d221df49 Add --use_fast option (closes #3741) 2023-09-25 12:19:43 -07:00
oobabooga 63de9eb24f Clean up the transformers loader 2023-09-24 20:26:26 -07:00
oobabooga 36c38d7561 Add disable_exllama to Transformers loader (for GPTQ LoRA training) 2023-09-24 20:03:11 -07:00
oobabooga 13ac55fa18 Reorder some functions 2023-09-19 13:51:57 -07:00
oobabooga f0ef971edb Remove obsolete warning 2023-09-18 12:25:10 -07:00
Johan fdcee0c215
Allow custom tokenizer for llamacpp_HF loader (#3941) 2023-09-15 12:38:38 -03:00
oobabooga c2a309f56e
Add ExLlamaV2 and ExLlamav2_HF loaders (#3881) 2023-09-12 14:33:07 -03:00
oobabooga 9331ab4798
Read GGUF metadata (#3873) 2023-09-11 18:49:30 -03:00
oobabooga ed86878f02 Remove GGML support 2023-09-11 07:44:00 -07:00
jllllll 4a999e3bcd
Use separate llama-cpp-python packages for GGML support 2023-08-26 10:40:08 -05:00
oobabooga 83640d6f43 Replace ggml occurences with gguf 2023-08-26 01:06:59 -07:00
cal066 960980247f
ctransformers: gguf support (#3685) 2023-08-25 11:33:04 -03:00
oobabooga 52ab2a6b9e Add rope_freq_base parameter for CodeLlama 2023-08-25 06:55:15 -07:00
cal066 7a4fcee069
Add ctransformers support (#3313)
---------

Co-authored-by: cal066 <cal066@users.noreply.github.com>
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
Co-authored-by: randoentity <137087500+randoentity@users.noreply.github.com>
2023-08-11 14:41:33 -03:00
oobabooga d8fb506aff Add RoPE scaling support for transformers (including dynamic NTK)
https://github.com/huggingface/transformers/pull/24653
2023-08-08 21:25:48 -07:00
oobabooga 65aa11890f
Refactor everything (#3481) 2023-08-06 21:49:27 -03:00
oobabooga 75c2dd38cf Remove flexgen support 2023-07-25 15:15:29 -07:00
appe233 89e0d15cf5
Use 'torch.backends.mps.is_available' to check if mps is supported (#3164) 2023-07-17 21:27:18 -03:00
oobabooga 5e3f7e00a9
Create llamacpp_HF loader (#3062) 2023-07-16 02:21:13 -03:00
oobabooga e202190c4f lint 2023-07-12 11:33:25 -07:00
FartyPants 9b55d3a9f9
More robust and error prone training (#3058) 2023-07-12 15:29:43 -03:00
oobabooga 5ac4e4da8b Make --model work with argument like models/folder_name 2023-07-08 10:22:54 -07:00
Xiaojian "JJ" Deng ff45317032
Update models.py (#3020)
Hopefully fixed error with "ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently 
imported."
2023-07-05 21:40:43 -03:00
oobabooga 8705eba830 Remove universal llama tokenizer support
Instead replace it with a warning if the tokenizer files look off
2023-07-04 19:43:19 -07:00
FartyPants 33f56fd41d
Update models.py to clear LORA names after unload (#2951) 2023-07-03 17:39:06 -03:00
oobabooga f0fcd1f697 Sort some imports 2023-06-25 01:44:36 -03:00
Panchovix 5646690769
Fix some models not loading on exllama_hf (#2835) 2023-06-23 11:31:02 -03:00
LarryVRH 580c1ee748
Implement a demo HF wrapper for exllama to utilize existing HF transformers decoding. (#2777) 2023-06-21 15:31:42 -03:00
ThisIsPIRI def3b69002
Fix loading condition for universal llama tokenizer (#2753) 2023-06-18 18:14:06 -03:00
oobabooga 9f40032d32
Add ExLlama support (#2444) 2023-06-16 20:35:38 -03:00
oobabooga 7ef6a50e84
Reorganize model loading UI completely (#2720) 2023-06-16 19:00:37 -03:00
oobabooga 00b94847da Remove softprompt support 2023-06-06 07:42:23 -03:00
oobabooga f276d88546 Use AutoGPTQ by default for GPTQ models 2023-06-05 15:41:48 -03:00
oobabooga 3578dd3611
Change a warning message 2023-05-29 22:40:54 -03:00
Luis Lopez 9e7204bef4
Add tail-free and top-a sampling (#2357) 2023-05-29 21:40:01 -03:00
Forkoz 60ae80cf28
Fix hang in tokenizer for AutoGPTQ llama models. (#2399) 2023-05-28 23:10:10 -03:00
oobabooga 361451ba60
Add --load-in-4bit parameter (#2320) 2023-05-25 01:14:13 -03:00
oobabooga cd3618d7fb Add support for RWKV in Hugging Face format 2023-05-23 02:07:28 -03:00
oobabooga e116d31180 Prevent unwanted log messages from modules 2023-05-21 22:42:34 -03:00
oobabooga 05593a7834 Minor bug fix 2023-05-20 23:22:36 -03:00
oobabooga 9d5025f531 Improve error handling while loading GPTQ models 2023-05-19 11:20:08 -03:00
oobabooga ef10ffc6b4 Add various checks to model loading functions 2023-05-17 16:14:54 -03:00
oobabooga abd361b3a0 Minor change 2023-05-17 11:33:43 -03:00
oobabooga 21ecc3701e Avoid a name conflict 2023-05-17 11:23:13 -03:00
oobabooga 1a8151a2b6
Add AutoGPTQ support (basic) (#2132) 2023-05-17 11:12:12 -03:00
oobabooga 7584d46c29
Refactor models.py (#2113) 2023-05-16 19:52:22 -03:00
oobabooga 4e66f68115 Create get_max_memory_dict() function 2023-05-15 19:38:27 -03:00
oobabooga 2eeb27659d Fix bug in --cpu-memory 2023-05-12 06:17:07 -03:00
oobabooga 3316e33d14 Remove unused code 2023-05-10 11:59:59 -03:00
oobabooga 3913155c1f
Style improvements (#1957) 2023-05-09 22:49:39 -03:00
Wesley Pyburn a2b25322f0
Fix trust_remote_code in wrong location (#1953) 2023-05-09 19:22:10 -03:00
EgrorBs d3ea70f453
More trust_remote_code=trust_remote_code (#1899) 2023-05-07 23:48:20 -03:00
oobabooga 97a6a50d98 Use oasst tokenizer instead of universal tokenizer 2023-05-04 15:55:39 -03:00
Mylo bd531c2dc2
Make --trust-remote-code work for all models (#1772) 2023-05-04 02:01:28 -03:00
oobabooga 9c77ab4fc2 Improve some warnings 2023-05-03 22:06:46 -03:00
oobabooga 95d04d6a8d Better warning messages 2023-05-03 21:43:17 -03:00
Ahmed Said fbcd32988e
added no_mmap & mlock parameters to llama.cpp and removed llamacpp_model_alternative (#1649)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-05-02 18:25:28 -03:00