OurBigBook
About
$
Donate
Sign in
Sign up
llama-cli inference batching
ID: llama-cli-inference-batching
Top articles
Latest articles
New article in topic
Show body
Body
0
llama-cli inference batching
by
Ciro Santilli
37
2025-08-08
As
of
llama.cpp
79e0b68c178656bb0632cb8602d2940b755077f8 there is
a
--parallel
option but not sure what it does.
Bibliography:
github.com/ggml-org/llama.cpp/discussions/3222
www.reddit.com/r/LocalLLaMA/comments/12aj0ze/what_is_batchsize_in_llamacpp_also_known_as_n/
www.reddit.com/r/LocalLLaMA/comments/12gtanv/batch_queries/
related for server:
www.reddit.com/r/LocalLLaMA/comments/1f19t2l/parallel_requests_using_llamaserver
Total
articles
:
1
New to
topics
?
Read the docs here!