OurBigBook About$ Donate
 Sign in Sign up

llama-cli inference batching

Ciro Santilli (@cirosantilli, 37) ... Text-to-text model Large language model Open source LLM Ollama llama.cpp llama-cli
2025-08-08  0 By others on same topic  0 Discussions Create my own version
Tags: LLM inference batching
As of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8 there is a --parallel option but not sure what it does.
Bibliography:
  • github.com/ggml-org/llama.cpp/discussions/3222
  • www.reddit.com/r/LocalLLaMA/comments/12aj0ze/what_is_batchsize_in_llamacpp_also_known_as_n/
  • www.reddit.com/r/LocalLLaMA/comments/12gtanv/batch_queries/
  • related for server:
    • www.reddit.com/r/LocalLLaMA/comments/1f19t2l/parallel_requests_using_llamaserver

 Ancestors (17)

  1. llama-cli
  2. llama.cpp
  3. Ollama
  4. Open source LLM
  5. Large language model
  6. Text-to-text model
  7. AI text generation
  8. Generative AI by modality
  9. Generative AI
  10. AI by capability
  11. Artificial intelligence
  12. Machine learning
  13. Computer
  14. Information technology
  15. Area of technology
  16. Technology
  17.  Home

 View article source

 Discussion (0)

New discussion

There are no discussions about this article yet.

 Articles by others on the same topic (0)

There are currently no matching articles.
  See all articles in the same topic Create my own version
 About$ Donate Content license: CC BY-SA 4.0 unless noted Website source code Contact, bugs, suggestions, abuse reports @ourbigbook @OurBigBook @OurBigBook