text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2025-12-06 07:12:10 +01:00

Author	SHA1	Message	Date
oobabooga	8be798e15f	llama.cpp: Fix stderr deadlock while loading some multimodal models	2025-08-24 12:20:05 -07:00
oobabooga	7fe8da8944	Minor simplification after `f247c2ae62`	2025-08-22 14:42:56 -07:00
altoiddealer	57f6e9af5a	Set multimodal status during Model Loading (#7199 )	2025-08-13 16:47:27 -03:00
oobabooga	8d7b88106a	Revert "mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)" This reverts commit `d8fcc71616`.	2025-08-12 13:20:16 -07:00
oobabooga	d8fcc71616	mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)	2025-08-11 18:02:33 -07:00
oobabooga	e6447cd24a	mtmd: Update the llama-server request	2025-08-11 17:42:35 -07:00
oobabooga	0e3def449a	llama.cpp: --swa-full to llama-server when streaming-llm is checked	2025-08-11 15:17:25 -07:00
oobabooga	b62c8845f3	mtmd: Fix /chat/completions for llama.cpp	2025-08-11 12:01:59 -07:00
oobabooga	2f90ac9880	Move the new image_utils.py file to modules/	2025-08-09 21:41:38 -07:00
oobabooga	d86b0ec010	Add multimodal support (llama.cpp) (#7027 )	2025-08-10 01:27:25 -03:00
oobabooga	609c3ac893	Optimize the end of generation with llama.cpp	2025-06-15 08:03:27 -07:00
oobabooga	18bd78f1f0	Make the llama.cpp prompt processing messages shorter	2025-06-10 14:03:25 -07:00
oobabooga	f8f23b5489	Simplify the llama.cpp stderr filter code	2025-06-06 22:25:13 -07:00
oobabooga	45f823ddf6	Print \n after the llama.cpp progress bar reaches 1.0	2025-06-06 22:23:34 -07:00
oobabooga	2db7745cbd	Show llama.cpp prompt processing on one line instead of many lines	2025-06-01 22:12:24 -07:00
oobabooga	e4d3f4449d	API: Fix a regression	2025-05-16 13:02:27 -07:00
oobabooga	5534d01da0	Estimate the VRAM for GGUF models + autoset `gpu-layers` (#6980 )	2025-05-16 00:07:37 -03:00
oobabooga	62c774bf24	Revert "New attempt" This reverts commit `e7ac06c169`.	2025-05-13 06:42:25 -07:00
oobabooga	e7ac06c169	New attempt	2025-05-10 19:20:04 -07:00
oobabooga	9ea2a69210	llama.cpp: Add --no-webui to the llama-server command	2025-05-08 10:41:25 -07:00
oobabooga	c4f36db0d8	llama.cpp: remove tfs (it doesn't get used)	2025-05-06 08:41:13 -07:00
oobabooga	d1c0154d66	llama.cpp: Add top_n_sigma, fix typical_p in sampler priority	2025-05-06 06:38:39 -07:00
oobabooga	b817bb33fd	Minor fix after `df7bb0db1f`	2025-05-05 05:00:20 -07:00
oobabooga	b7a5c7db8d	llama.cpp: Handle short arguments in --extra-flags	2025-05-04 07:14:42 -07:00
oobabooga	4c2e3b168b	llama.cpp: Add a retry mechanism when getting the logits (sometimes it fails)	2025-05-03 06:51:20 -07:00
oobabooga	b950a0c6db	Lint	2025-04-30 20:02:10 -07:00
oobabooga	a6c3ec2299	llama.cpp: Explicitly send cache_prompt = True	2025-04-30 15:24:07 -07:00
oobabooga	1ee0acc852	llama.cpp: Make --verbose print the llama-server command	2025-04-28 15:56:25 -07:00
oobabooga	c6c2855c80	llama.cpp: Remove the timeout while loading models (closes #6907 )	2025-04-27 21:22:21 -07:00
oobabooga	7b80acd524	Fix parsing --extra-flags	2025-04-26 18:40:03 -07:00
oobabooga	234aba1c50	llama.cpp: Simplify the prompt processing progress indicator The progress bar was unreliable	2025-04-26 17:33:47 -07:00
oobabooga	d4b1e31c49	Use `--ctx-size` to specify the context size for all loaders Old flags are still recognized as alternatives.	2025-04-25 16:59:03 -07:00
oobabooga	faababc4ea	llama.cpp: Add a prompt processing progress bar	2025-04-25 16:42:30 -07:00
oobabooga	877cf44c08	llama.cpp: Add StreamingLLM (`--streaming-llm`)	2025-04-25 16:21:41 -07:00
oobabooga	98f4c694b9	llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server	2025-04-25 07:32:51 -07:00
oobabooga	e99c20bcb0	llama.cpp: Add speculative decoding (#6891 )	2025-04-23 20:10:16 -03:00
Matthew Jenkins	d3e7c655e5	Add support for llama-cpp builds from https://github.com/ggml-org/llama.cpp (#6862 )	2025-04-20 23:06:24 -03:00
oobabooga	5ab069786b	llama.cpp: add back the two encode calls (they are harmless now)	2025-04-19 17:38:36 -07:00
oobabooga	b9da5c7e3a	Use 127.0.0.1 instead of localhost for faster llama.cpp on Windows	2025-04-19 17:36:04 -07:00
oobabooga	9c9df2063f	llama.cpp: fix unicode decoding (closes #6856 )	2025-04-19 16:38:15 -07:00
oobabooga	ba976d1390	llama.cpp: avoid two 'encode' calls	2025-04-19 16:35:01 -07:00
oobabooga	ed42154c78	Revert "llama.cpp: close the connection immediately on 'Stop'" This reverts commit `5fdebc554b`.	2025-04-19 05:32:36 -07:00
oobabooga	5fdebc554b	llama.cpp: close the connection immediately on 'Stop'	2025-04-19 04:59:24 -07:00
oobabooga	6589ebeca8	Revert "llama.cpp: new optimization attempt" This reverts commit `e2e73ed22f`.	2025-04-18 21:16:21 -07:00
oobabooga	e2e73ed22f	llama.cpp: new optimization attempt	2025-04-18 21:05:08 -07:00
oobabooga	e2e90af6cd	llama.cpp: don't include --rope-freq-base in the launch command if null	2025-04-18 20:51:18 -07:00
oobabooga	9f07a1f5d7	llama.cpp: new attempt at optimizing the llama-server connection	2025-04-18 19:30:53 -07:00
oobabooga	f727b4a2cc	llama.cpp: close the connection properly when generation is cancelled	2025-04-18 19:01:39 -07:00
oobabooga	b3342b8dd8	llama.cpp: optimize the llama-server connection	2025-04-18 18:46:36 -07:00
oobabooga	2002590536	Revert "Attempt at making the llama-server streaming more efficient." This reverts commit `5ad080ff25`.	2025-04-18 18:13:54 -07:00

1 2

59 commits