text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2026-03-09 15:13:56 +01:00

Author	SHA1	Message	Date
oobabooga	5fa709a3f4	llama.cpp server: use port+5 offset and suppress No parser definition detected logs	2026-03-06 18:52:34 -08:00
oobabooga	93ebfa2b7e	Fix llama-server output filter for new log format	2026-03-06 02:38:13 -03:00
oobabooga	66fb79fe15	llama.cpp: Add --fit-target param	2026-03-06 01:55:48 -03:00
oobabooga	e2548f69a9	Make user_data configurable: add --user-data-dir flag, auto-detect ../user_data If --user-data-dir is not set, auto-detect: use ../user_data when ./user_data doesn't exist, making it easy to share user data across portable builds by placing it one folder up.	2026-03-05 19:31:10 -08:00
oobabooga	9824c82cb6	API: Add parallel request support for llama.cpp and ExLlamaV3	2026-03-05 16:49:58 -08:00
oobabooga	69fa4dd0b1	llama.cpp: allow ctx_size=0 for auto context via --fit	2026-03-04 19:33:20 -08:00
oobabooga	fbfcd59fe0	llama.cpp: Use -1 instead of 0 for auto gpu_layers	2026-03-04 19:21:45 -08:00
oobabooga	da3010c3ed	tiny improvements to llama_cpp_server.py	2026-03-04 15:54:37 -08:00
Sense_wang	7bf15ad933	fix: replace bare except clauses with except Exception (#7400 )	2026-03-04 18:06:17 -03:00
oobabooga	cdf0e392e6	llama.cpp: Reorganize speculative decoding UI and use recommended ngram-mod defaults	2026-03-04 12:05:08 -08:00
oobabooga	65de4c30c8	Add adaptive-p sampler and n-gram speculative decoding support	2026-03-04 09:41:29 -08:00
oobabooga	f4d787ab8d	Delegate GPU layer allocation to llama.cpp's --fit	2026-03-04 06:37:50 -08:00
oobabooga	8a3d866401	Fix temperature_last having no effect in llama.cpp server sampler order	2026-03-04 06:10:51 -08:00
oobabooga	c54e8a2b3d	Try to spawn llama.cpp on port 5001 instead of random port	2026-01-28 08:23:55 -08:00
GodEmperor785	400bb0694b	Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316 )	2025-11-21 16:56:02 -03:00
oobabooga	0d4eff284c	Add a --cpu-moe model for llama.cpp	2025-11-19 05:23:43 -08:00
mamei16	308e726e11	log error when llama-server request exceeds context size (#7263 )	2025-10-12 23:00:11 -03:00
oobabooga	f3829b268a	llama.cpp: Always pass --flash-attn on	2025-09-02 12:12:17 -07:00
oobabooga	13876a1ee8	llama.cpp: Remove the --flash-attn flag (it's always on now)	2025-08-30 20:28:26 -07:00
oobabooga	3ad5970374	Make the llama.cpp --verbose output less verbose	2025-08-25 17:43:21 -07:00
oobabooga	8be798e15f	llama.cpp: Fix stderr deadlock while loading some multimodal models	2025-08-24 12:20:05 -07:00
oobabooga	7fe8da8944	Minor simplification after `f247c2ae62`	2025-08-22 14:42:56 -07:00
altoiddealer	57f6e9af5a	Set multimodal status during Model Loading (#7199 )	2025-08-13 16:47:27 -03:00
oobabooga	8d7b88106a	Revert "mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)" This reverts commit `d8fcc71616`.	2025-08-12 13:20:16 -07:00
oobabooga	d8fcc71616	mtmd: Fail early if images are provided but the model doesn't support them (llama.cpp)	2025-08-11 18:02:33 -07:00
oobabooga	e6447cd24a	mtmd: Update the llama-server request	2025-08-11 17:42:35 -07:00
oobabooga	0e3def449a	llama.cpp: --swa-full to llama-server when streaming-llm is checked	2025-08-11 15:17:25 -07:00
oobabooga	b62c8845f3	mtmd: Fix /chat/completions for llama.cpp	2025-08-11 12:01:59 -07:00
oobabooga	2f90ac9880	Move the new image_utils.py file to modules/	2025-08-09 21:41:38 -07:00
oobabooga	d86b0ec010	Add multimodal support (llama.cpp) (#7027 )	2025-08-10 01:27:25 -03:00
oobabooga	609c3ac893	Optimize the end of generation with llama.cpp	2025-06-15 08:03:27 -07:00
oobabooga	18bd78f1f0	Make the llama.cpp prompt processing messages shorter	2025-06-10 14:03:25 -07:00
oobabooga	f8f23b5489	Simplify the llama.cpp stderr filter code	2025-06-06 22:25:13 -07:00
oobabooga	45f823ddf6	Print \n after the llama.cpp progress bar reaches 1.0	2025-06-06 22:23:34 -07:00
oobabooga	2db7745cbd	Show llama.cpp prompt processing on one line instead of many lines	2025-06-01 22:12:24 -07:00
oobabooga	e4d3f4449d	API: Fix a regression	2025-05-16 13:02:27 -07:00
oobabooga	5534d01da0	Estimate the VRAM for GGUF models + autoset `gpu-layers` (#6980 )	2025-05-16 00:07:37 -03:00
oobabooga	62c774bf24	Revert "New attempt" This reverts commit `e7ac06c169`.	2025-05-13 06:42:25 -07:00
oobabooga	e7ac06c169	New attempt	2025-05-10 19:20:04 -07:00
oobabooga	9ea2a69210	llama.cpp: Add --no-webui to the llama-server command	2025-05-08 10:41:25 -07:00
oobabooga	c4f36db0d8	llama.cpp: remove tfs (it doesn't get used)	2025-05-06 08:41:13 -07:00
oobabooga	d1c0154d66	llama.cpp: Add top_n_sigma, fix typical_p in sampler priority	2025-05-06 06:38:39 -07:00
oobabooga	b817bb33fd	Minor fix after `df7bb0db1f`	2025-05-05 05:00:20 -07:00
oobabooga	b7a5c7db8d	llama.cpp: Handle short arguments in --extra-flags	2025-05-04 07:14:42 -07:00
oobabooga	4c2e3b168b	llama.cpp: Add a retry mechanism when getting the logits (sometimes it fails)	2025-05-03 06:51:20 -07:00
oobabooga	b950a0c6db	Lint	2025-04-30 20:02:10 -07:00
oobabooga	a6c3ec2299	llama.cpp: Explicitly send cache_prompt = True	2025-04-30 15:24:07 -07:00
oobabooga	1ee0acc852	llama.cpp: Make --verbose print the llama-server command	2025-04-28 15:56:25 -07:00
oobabooga	c6c2855c80	llama.cpp: Remove the timeout while loading models (closes #6907 )	2025-04-27 21:22:21 -07:00
oobabooga	7b80acd524	Fix parsing --extra-flags	2025-04-26 18:40:03 -07:00

1 2

79 commits