Commit graph

1575 commits

Author SHA1 Message Date
oobabooga 0ef1b8f8b4 Use ExLlamaV2 (instead of the HF one) for EXL2 models for now
It doesn't seem to have the "OverflowError" bug
2025-04-17 05:47:40 -07:00
oobabooga 682c78ea42 Add back detection of GPTQ models (closes #6841) 2025-04-11 21:00:42 -07:00
oobabooga 4ed0da74a8 Remove the obsolete 'multimodal' extension 2025-04-09 20:09:48 -07:00
oobabooga 598568b1ed Revert "UI: remove the streaming cursor"
This reverts commit 6ea0206207.
2025-04-09 16:03:14 -07:00
oobabooga 297a406e05 UI: smoother chat streaming
This removes the throttling associated to gr.Textbox that made words appears in chunks rather than one at a time
2025-04-09 16:02:37 -07:00
oobabooga 6ea0206207 UI: remove the streaming cursor 2025-04-09 14:59:34 -07:00
oobabooga 8b8d39ec4e
Add ExLlamaV3 support (#6832) 2025-04-09 00:07:08 -03:00
oobabooga bf48ec8c44 Remove an unnecessary UI message 2025-04-07 17:43:41 -07:00
oobabooga a5855c345c
Set context lengths to at most 8192 by default (to prevent out of memory errors) (#6835) 2025-04-07 21:42:33 -03:00
oobabooga 109de34e3b Remove the old --model-menu flag 2025-03-31 09:24:03 -07:00
oobabooga 758c3f15a5 Lint 2025-03-14 20:04:43 -07:00
oobabooga 5bcd2d7ad0
Add the top N-sigma sampler (#6796) 2025-03-14 16:45:11 -03:00
oobabooga 26317a4c7e Fix jinja2 error while loading c4ai-command-a-03-2025 2025-03-14 10:59:05 -07:00
Kelvie Wong 16fa9215c4
Fix OpenAI API with new param (show_after), closes #6747 (#6749)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2025-02-18 12:01:30 -03:00
oobabooga dba17c40fc Make transformers 4.49 functional 2025-02-17 17:31:11 -08:00
SamAcctX f28f39792d
update deprecated deepspeed import for transformers 4.46+ (#6725) 2025-02-02 20:41:36 -03:00
oobabooga c6f2c2fd7e UI: style improvements 2025-02-02 15:34:03 -08:00
oobabooga 0360f54ae8 UI: add a "Show after" parameter (to use with DeepSeek </think>) 2025-02-02 15:30:09 -08:00
oobabooga f01cc079b9 Lint 2025-01-29 14:00:59 -08:00
oobabooga 75ff3f3815 UI: Mention common context length values 2025-01-25 08:22:23 -08:00
FP HAM 71a551a622
Add strftime_now to JINJA to sattisfy LLAMA 3.1 and 3.2 (and granite) (#6692) 2025-01-24 11:37:20 -03:00
oobabooga 0485ff20e8 Workaround for convert_to_markdown bug 2025-01-23 06:21:40 -08:00
oobabooga 39799adc47 Add a helpful error message when llama.cpp fails to load the model 2025-01-21 12:49:12 -08:00
oobabooga 5e99dded4e UI: add "Continue" and "Remove" buttons below the last chat message 2025-01-21 09:05:44 -08:00
oobabooga 0258a6f877 Fix the Google Colab notebook 2025-01-16 05:21:18 -08:00
oobabooga 1ef748fb20 Lint 2025-01-14 16:44:15 -08:00
oobabooga f843cb475b UI: update a help message 2025-01-14 08:12:51 -08:00
oobabooga c832953ff7 UI: Activate auto_max_new_tokens by default 2025-01-14 05:59:55 -08:00
Underscore 53b838d6c5
HTML: Fix quote pair RegEx matching for all quote types (#6661) 2025-01-13 18:01:50 -03:00
oobabooga c85e5e58d0 UI: move the new morphdom code to a .js file 2025-01-13 06:20:42 -08:00
oobabooga facb4155d4 Fix morphdom leaving ghost elements behind 2025-01-11 20:57:28 -08:00
oobabooga a0492ce325
Optimize syntax highlighting during chat streaming (#6655) 2025-01-11 21:14:10 -03:00
mamei16 f1797f4323
Unescape backslashes in html_output (#6648) 2025-01-11 18:39:44 -03:00
oobabooga 1b9121e5b8 Add a "refresh" button below the last message, add a missing file 2025-01-11 12:42:25 -08:00
oobabooga a5d64b586d
Add a "copy" button below each message (#6654) 2025-01-11 16:59:21 -03:00
oobabooga 3a722a36c8
Use morphdom to make chat streaming 1902381098231% faster (#6653) 2025-01-11 12:55:19 -03:00
oobabooga d2f6c0f65f Update README 2025-01-10 13:25:40 -08:00
oobabooga c393f7650d Update settings-template.yaml, organize modules/shared.py 2025-01-10 13:22:18 -08:00
oobabooga 83c426e96b
Organize internals (#6646) 2025-01-10 18:04:32 -03:00
oobabooga 7fe46764fb Improve the --help message about --tensorcores as well 2025-01-10 07:07:41 -08:00
oobabooga da6d868f58 Remove old deprecated flags (~6 months or more) 2025-01-09 16:11:46 -08:00
oobabooga f3c0f964a2 Lint 2025-01-09 13:18:23 -08:00
oobabooga 3020f2e5ec UI: improve the info message about --tensorcores 2025-01-09 12:44:03 -08:00
oobabooga c08d87b78d Make the huggingface loader more readable 2025-01-09 12:23:38 -08:00
BPplays 619265b32c
add ipv6 support to the API (#6559) 2025-01-09 10:23:44 -03:00
oobabooga 5c89068168 UI: add an info message for the new Static KV cache option 2025-01-08 17:36:30 -08:00
nclok1405 b9e2ded6d4
Added UnicodeDecodeError workaround for modules/llamacpp_model.py (#6040)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2025-01-08 21:17:31 -03:00
oobabooga 91a8a87887 Remove obsolete code 2025-01-08 15:07:21 -08:00
oobabooga 7157257c3f
Remove the AutoGPTQ loader (#6641) 2025-01-08 19:28:56 -03:00
oobabooga c0f600c887 Add a --torch-compile flag for transformers 2025-01-05 05:47:00 -08:00
oobabooga 11af199aff Add a "Static KV cache" option for transformers 2025-01-04 17:52:57 -08:00
oobabooga 3967520e71 Connect XTC, DRY, smoothing_factor, and dynatemp to ExLlamaV2 loader (non-HF) 2025-01-04 16:25:06 -08:00
oobabooga 049297fa66 UI: reduce the size of CSS sent to the UI during streaming 2025-01-04 14:09:36 -08:00
oobabooga 0e673a7a42 UI: reduce the size of HTML sent to the UI during streaming 2025-01-04 11:40:24 -08:00
mamei16 9f24885bd2
Sane handling of markdown lists (#6626) 2025-01-04 15:41:31 -03:00
oobabooga 4b3e1b3757 UI: add a "Search chats" input field 2025-01-02 18:46:40 -08:00
oobabooga b8fc9010fa UI: fix orjson.JSONDecodeError error on page reload 2025-01-02 16:57:04 -08:00
oobabooga 75f1b5ccde UI: add a "Branch chat" button 2025-01-02 16:24:18 -08:00
Petr Korolev 13c033c745
Fix CUDA error on MPS backend during API request (#6572)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2025-01-02 00:06:11 -03:00
oobabooga 725639118a UI: Use a tab length of 2 for lists (rather than 4) 2025-01-01 13:53:50 -08:00
oobabooga 7b88724711
Make responses start faster by removing unnecessary cleanup calls (#6625) 2025-01-01 18:33:38 -03:00
oobabooga 64853f8509 Reapply a necessary change that I removed from #6599 (thanks @mamei16!) 2024-12-31 14:43:22 -08:00
mamei16 e953af85cd
Fix newlines in the markdown renderer (#6599)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2024-12-31 01:04:02 -03:00
oobabooga 39a5c9a49c
UI organization (#6618) 2024-12-29 11:16:17 -03:00
oobabooga 0490ee620a UI: increase the threshold for a <li> to be considered long (some more) 2024-12-19 16:51:34 -08:00
oobabooga 89888bef56 UI: increase the threshold for a <li> to be considered long 2024-12-19 14:38:36 -08:00
oobabooga 2acec386fc UI: improve the streaming cursor 2024-12-19 14:08:56 -08:00
oobabooga e2fb86e5df UI: further improve the style of lists and headings 2024-12-19 13:59:24 -08:00
oobabooga c48e4622e8 UI: update a link 2024-12-18 06:28:14 -08:00
oobabooga b27f6f8915 Lint 2024-12-17 20:13:32 -08:00
oobabooga b051e2c161 UI: improve a margin for readability 2024-12-17 19:58:21 -08:00
oobabooga 60c93e0c66 UI: Set cache_type to fp16 by default 2024-12-17 19:44:20 -08:00
oobabooga ddccc0d657 UI: minor change to log messages 2024-12-17 19:39:00 -08:00
oobabooga 3030c79e8c UI: show progress while loading a model 2024-12-17 19:37:43 -08:00
Diner Burger addad3c63e
Allow more granular KV cache settings (#6561) 2024-12-17 17:43:48 -03:00
oobabooga c43ee5db11 UI: very minor color change 2024-12-17 07:59:55 -08:00
oobabooga d769618591
Improved UI (#6575) 2024-12-17 00:47:41 -03:00
oobabooga 350758f81c UI: Fix the history upload event 2024-11-19 20:34:53 -08:00
oobabooga d01293861b Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-11-18 10:15:36 -08:00
oobabooga 3d19746a5d UI: improve HTML rendering for lists with sub-lists 2024-11-18 10:14:09 -08:00
mefich 1c937dad72
Filter whitespaces in downloader fields in model tab (#6518) 2024-11-18 12:01:40 -03:00
PIRI e1061ba7e3
Make token bans work again on HF loaders (#6488) 2024-10-24 15:24:02 -03:00
oobabooga 2468cfd8bb Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-10-14 13:25:27 -07:00
oobabooga bb62e796eb Fix locally compiled llama-cpp-python failing to import 2024-10-14 13:24:13 -07:00
oobabooga c9a9f63d1b Fix llama.cpp loader not being random (thanks @reydeljuego12345) 2024-10-14 13:07:07 -07:00
PIRI 03a2e70054
Fix temperature_last when temperature not in sampler priority (#6439) 2024-10-09 11:25:14 -03:00
oobabooga 49dfa0adaf Fix the "save preset" event 2024-10-01 11:20:48 -07:00
oobabooga 93c250b9b6 Add a UI element for enable_tp 2024-10-01 11:16:15 -07:00
oobabooga cca9d6e22d Lint 2024-10-01 10:21:06 -07:00
oobabooga 4d9ce586d3 Update llama_cpp_python_hijack.py, fix llamacpp_hf 2024-09-30 14:49:21 -07:00
oobabooga bbdeed3cf4 Make sampler priority high if unspecified 2024-09-29 20:45:27 -07:00
Manuel Schmid 0f90a1b50f
Do not set value for histories in chat when --multi-user is used (#6317) 2024-09-29 01:08:55 -03:00
oobabooga c61b29b9ce Simplify the warning when flash-attn fails to import 2024-09-28 20:33:17 -07:00
oobabooga b92d7fd43e Add warnings for when AutoGPTQ, TensorRT-LLM, or HQQ are missing 2024-09-28 20:30:24 -07:00
oobabooga 7276dca933 Fix a typo 2024-09-27 20:28:17 -07:00
RandoInternetPreson 46996f6519
ExllamaV2 tensor parallelism to increase multi gpu inference speeds (#6356) 2024-09-28 00:26:03 -03:00
Philipp Emanuel Weidmann 301375834e
Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition (#6335) 2024-09-27 22:50:12 -03:00
oobabooga 5c918c5b2d Make it possible to sort DRY 2024-09-27 15:40:48 -07:00
oobabooga 7424f789bf
Fix the sampling monkey patch (and add more options to sampler_priority) (#6411) 2024-09-27 19:03:25 -03:00
oobabooga bba5b36d33 Don't import PEFT unless necessary 2024-09-03 19:40:53 -07:00