oobabooga
|
bc55feaf3e
|
Improve host header validation in local mode
|
2025-04-26 15:42:17 -07:00 |
|
oobabooga
|
3a207e7a57
|
Improve the --help formatting a bit
|
2025-04-26 07:31:04 -07:00 |
|
oobabooga
|
6acb0e1bee
|
Change a UI description
|
2025-04-26 05:13:08 -07:00 |
|
oobabooga
|
cbd4d967cc
|
Update a --help message
|
2025-04-26 05:09:52 -07:00 |
|
oobabooga
|
763a7011c0
|
Remove an ancient/obsolete migration check
|
2025-04-26 04:59:05 -07:00 |
|
oobabooga
|
d9de14d1f7
|
Restructure the repository (#6904)
|
2025-04-26 08:56:54 -03:00 |
|
oobabooga
|
d4017fbb6d
|
ExLlamaV3: Add kv cache quantization (#6903)
|
2025-04-25 21:32:00 -03:00 |
|
oobabooga
|
d4b1e31c49
|
Use --ctx-size to specify the context size for all loaders
Old flags are still recognized as alternatives.
|
2025-04-25 16:59:03 -07:00 |
|
oobabooga
|
faababc4ea
|
llama.cpp: Add a prompt processing progress bar
|
2025-04-25 16:42:30 -07:00 |
|
oobabooga
|
877cf44c08
|
llama.cpp: Add StreamingLLM (--streaming-llm)
|
2025-04-25 16:21:41 -07:00 |
|
oobabooga
|
d35818f4e1
|
UI: Add a collapsible thinking block to messages with <think> steps (#6902)
|
2025-04-25 18:02:02 -03:00 |
|
oobabooga
|
98f4c694b9
|
llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server
|
2025-04-25 07:32:51 -07:00 |
|
oobabooga
|
5861013e68
|
Merge remote-tracking branch 'refs/remotes/origin/dev' into dev
|
2025-04-24 20:36:20 -07:00 |
|
oobabooga
|
a90df27ff5
|
UI: Add a greeting when the chat history is empty
|
2025-04-24 20:33:40 -07:00 |
|
oobabooga
|
ae1fe87365
|
ExLlamaV2: Add speculative decoding (#6899)
|
2025-04-25 00:11:04 -03:00 |
|
Matthew Jenkins
|
8f2493cc60
|
Prevent llamacpp defaults from locking up consumer hardware (#6870)
|
2025-04-24 23:38:57 -03:00 |
|
oobabooga
|
93fd4ad25d
|
llama.cpp: Document the --device-draft syntax
|
2025-04-24 09:20:11 -07:00 |
|
oobabooga
|
f1b64df8dd
|
EXL2: add another torch.cuda.synchronize() call to prevent errors
|
2025-04-24 09:03:49 -07:00 |
|
oobabooga
|
c71a2af5ab
|
Handle CMD_FLAGS.txt in the main code (closes #6896)
|
2025-04-24 08:21:06 -07:00 |
|
oobabooga
|
bfbde73409
|
Make 'instruct' the default chat mode
|
2025-04-24 07:08:49 -07:00 |
|
oobabooga
|
e99c20bcb0
|
llama.cpp: Add speculative decoding (#6891)
|
2025-04-23 20:10:16 -03:00 |
|
oobabooga
|
9424ba17c8
|
UI: show only part 00001 of multipart GGUF models in the model menu
|
2025-04-22 19:56:42 -07:00 |
|
oobabooga
|
25cf3600aa
|
Lint
|
2025-04-22 08:04:02 -07:00 |
|
oobabooga
|
39cbb5fee0
|
Lint
|
2025-04-22 08:03:25 -07:00 |
|
oobabooga
|
008c6dd682
|
Lint
|
2025-04-22 08:02:37 -07:00 |
|
oobabooga
|
78aeabca89
|
Fix the transformers loader
|
2025-04-21 18:33:14 -07:00 |
|
oobabooga
|
8320190184
|
Fix the exllamav2_HF and exllamav3_HF loaders
|
2025-04-21 18:32:23 -07:00 |
|
oobabooga
|
15989c2ed8
|
Make llama.cpp the default loader
|
2025-04-21 16:36:35 -07:00 |
|
oobabooga
|
86c3ed3218
|
Small change to the unload_model() function
|
2025-04-20 20:00:56 -07:00 |
|
oobabooga
|
fe8e80e04a
|
Merge remote-tracking branch 'refs/remotes/origin/dev' into dev
|
2025-04-20 19:09:27 -07:00 |
|
oobabooga
|
ff1c00bdd9
|
llama.cpp: set the random seed manually
|
2025-04-20 19:08:44 -07:00 |
|
Matthew Jenkins
|
d3e7c655e5
|
Add support for llama-cpp builds from https://github.com/ggml-org/llama.cpp (#6862)
|
2025-04-20 23:06:24 -03:00 |
|
oobabooga
|
e243424ba1
|
Fix an import
|
2025-04-20 17:51:28 -07:00 |
|
oobabooga
|
8cfd7f976b
|
Revert "Remove the old --model-menu flag"
This reverts commit 109de34e3b.
|
2025-04-20 13:35:42 -07:00 |
|
oobabooga
|
b3bf7a885d
|
Fix ExLlamaV2_HF and ExLlamaV3_HF after ae02ffc605
|
2025-04-20 11:32:48 -07:00 |
|
oobabooga
|
ae02ffc605
|
Refactor the transformers loader (#6859)
|
2025-04-20 13:33:47 -03:00 |
|
oobabooga
|
6ba0164c70
|
Lint
|
2025-04-19 17:45:21 -07:00 |
|
oobabooga
|
5ab069786b
|
llama.cpp: add back the two encode calls (they are harmless now)
|
2025-04-19 17:38:36 -07:00 |
|
oobabooga
|
b9da5c7e3a
|
Use 127.0.0.1 instead of localhost for faster llama.cpp on Windows
|
2025-04-19 17:36:04 -07:00 |
|
oobabooga
|
9c9df2063f
|
llama.cpp: fix unicode decoding (closes #6856)
|
2025-04-19 16:38:15 -07:00 |
|
oobabooga
|
ba976d1390
|
llama.cpp: avoid two 'encode' calls
|
2025-04-19 16:35:01 -07:00 |
|
oobabooga
|
ed42154c78
|
Revert "llama.cpp: close the connection immediately on 'Stop'"
This reverts commit 5fdebc554b.
|
2025-04-19 05:32:36 -07:00 |
|
oobabooga
|
5fdebc554b
|
llama.cpp: close the connection immediately on 'Stop'
|
2025-04-19 04:59:24 -07:00 |
|
oobabooga
|
6589ebeca8
|
Revert "llama.cpp: new optimization attempt"
This reverts commit e2e73ed22f.
|
2025-04-18 21:16:21 -07:00 |
|
oobabooga
|
e2e73ed22f
|
llama.cpp: new optimization attempt
|
2025-04-18 21:05:08 -07:00 |
|
oobabooga
|
e2e90af6cd
|
llama.cpp: don't include --rope-freq-base in the launch command if null
|
2025-04-18 20:51:18 -07:00 |
|
oobabooga
|
9f07a1f5d7
|
llama.cpp: new attempt at optimizing the llama-server connection
|
2025-04-18 19:30:53 -07:00 |
|
oobabooga
|
f727b4a2cc
|
llama.cpp: close the connection properly when generation is cancelled
|
2025-04-18 19:01:39 -07:00 |
|
oobabooga
|
b3342b8dd8
|
llama.cpp: optimize the llama-server connection
|
2025-04-18 18:46:36 -07:00 |
|
oobabooga
|
2002590536
|
Revert "Attempt at making the llama-server streaming more efficient."
This reverts commit 5ad080ff25.
|
2025-04-18 18:13:54 -07:00 |
|