{tag=LLM inference batching}

As of <llama.cpp> 79e0b68c178656bb0632cb8602d2940b755077f8 there is a `--parallel` option but not sure what it does.

Bibliography:
* https://github.com/ggml-org/llama.cpp/discussions/3222
* https://www.reddit.com/r/LocalLLaMA/comments/12aj0ze/what_is_batchsize_in_llamacpp_also_known_as_n/
* https://www.reddit.com/r/LocalLLaMA/comments/12gtanv/batch_queries/
* related for server:
  * https://www.reddit.com/r/LocalLLaMA/comments/1f19t2l/parallel_requests_using_llamaserver


 llama-cli inference batching

ID: llama-cli-inference-batching