OurBigBook
About
$
Donate
Sign in
Sign up
llama-cli inference batching
Ciro Santilli
(
@cirosantilli,
37
)
...
Text-to-text model
Large language model
Open source LLM
Ollama
llama.cpp
llama-cli
2025-08-08
0
Like
0 By others
on same topic
0 Discussions
Create my own version
Tags:
LLM inference batching
As
of
llama.cpp
79e0b68c178656bb0632cb8602d2940b755077f8 there is
a
--parallel
option but not sure what it does.
Bibliography:
github.com/ggml-org/llama.cpp/discussions/3222
www.reddit.com/r/LocalLLaMA/comments/12aj0ze/what_is_batchsize_in_llamacpp_also_known_as_n/
www.reddit.com/r/LocalLLaMA/comments/12gtanv/batch_queries/
related for server:
www.reddit.com/r/LocalLLaMA/comments/1f19t2l/parallel_requests_using_llamaserver
Ancestors
(17)
llama-cli
llama.cpp
Ollama
Open source LLM
Large language model
Text-to-text model
AI text generation
Generative AI by modality
Generative AI
AI by capability
Artificial intelligence
Machine learning
Computer
Information technology
Area of technology
Technology
Home
View article source
Discussion
(0)
Subscribe (1)
New discussion
There are no discussions about this article yet.
Articles by others on the same topic
(0)
There are currently no matching articles.
See all articles in the same topic
Create my own version