OurBigBook About$ Donate
 Sign in Sign up

 llama-cli inference batching

ID: llama-cli-inference-batching

 Top articles Latest articles New article in topic
llama-cli inference batching by Ciro Santilli 37 2025-08-08
As of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8 there is a --parallel option but not sure what it does.
Bibliography:
  • github.com/ggml-org/llama.cpp/discussions/3222
  • www.reddit.com/r/LocalLLaMA/comments/12aj0ze/what_is_batchsize_in_llamacpp_also_known_as_n/
  • www.reddit.com/r/LocalLLaMA/comments/12gtanv/batch_queries/
  • related for server:
    • www.reddit.com/r/LocalLLaMA/comments/1f19t2l/parallel_requests_using_llamaserver
 Read the full article
Total articles: 1

 New to topics? Read the docs here!

 About$ Donate Content license: CC BY-SA 4.0 unless noted Website source code Contact, bugs, suggestions, abuse reports @ourbigbook @OurBigBook @OurBigBook