<span title="Home" class="fa-solid-900 icon"></span> Home

Ciro Santilli's Homepage

Technology

Area of technology

Information technology

Computer

Machine learning

Artificial intelligence

AI by capability

Generative AI

Generative AI by modality

AI text generation

Text-to-text model

Large language model

Open source LLM

Ollama

llama.cpp

llama-cli

{tag=LLM inference batching}

As of <llama.cpp> 79e0b68c178656bb0632cb8602d2940b755077f8 there is a `--parallel` option but not sure what it does.

Bibliography:
* https://github.com/ggml-org/llama.cpp/discussions/3222
* https://www.reddit.com/r/LocalLLaMA/comments/12aj0ze/what_is_batchsize_in_llamacpp_also_known_as_n/
* https://www.reddit.com/r/LocalLLaMA/comments/12gtanv/batch_queries/
* related for server:
  * https://www.reddit.com/r/LocalLLaMA/comments/1f19t2l/parallel_requests_using_llamaserver


<llama-cli> inference batching

llama-cli inference batching

<llama cli> inference batching

llama-cli inference batching

 Ancestors (17)

 Discussion (0)

 Articles by others on the same topic (0)

llama-cli inference batching

 Ancestors (17)

 Discussion (0)  Subscribe (1)

 Articles by others on the same topic (0)

 Discussion (0)