llama-cli (source code)

= llama-cli
{c}

A <CLI> front-end for <llama.cpp>.

A decent test command as of <llama.cpp> 79e0b68c178656bb0632cb8602d2940b755077f8 tested on <Ubuntu 25.04>:
``
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake ..
make -j
cd bin
time ./llama-cli \
  --no-display-prompt \
  --single-turn \
  --temp 0 \
  -c 16384 \
  -cnv \
  -m ~/Downloads/Llama-3.1-Tulu-3-8B-Q8_0.gguf \
  -n 1000 \
  -ngl 100 \
  -p 'What is quantum field theory?' \
  -t 10 |
tee output.txt
``
and that was deterministic due to `--temp 0`.

Also, this command ran 2x faster at 18 tokens/s for 1000 tokens  on <Ciro Santilli's hardware/P14s> on GPU via Vulkan than on CPU which is achievable by removing the `-ngl 100`.