= llama-cli
{c}
A <CLI> front-end for <llama.cpp>.
A decent test command:
``
time ./llama-cli \
--no-display-prompt \
--single-turn \
--temp 0 \
-c 16384 \
-cnv \
-m Llama-3.1-Tulu-3-8B-Q8_0.gguf \
-n 1000 \
-ngl 100 \
-p 'What is quantum field theory?' \
-t 10 |
tee output.txt \
;
``
but it failed to be deterministic despite `--temperature 0`. This ran 2x faster at 18 tokens/s for 1000 tokens on <Ciro Santilli's hardware/P14s> on GPU via Vulkan than on CPU which is achievable by removing the `-ngl 100`.
Back to article page