A decent test command as of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8 tested on Ubuntu 25.04:and that was deterministic due to
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake ..
make -j
cd bin
time ./llama-cli \
--no-display-prompt \
--single-turn \
--temp 0 \
-c 16384 \
-cnv \
-m ~/Downloads/Llama-3.1-Tulu-3-8B-Q8_0.gguf \
-n 1000 \
-ngl 100 \
-p 'What is quantum field theory?' \
-t 10 |
tee output.txt
--temp 0
.Also, this command ran 2x faster at 18 tokens/s for 1000 tokens on P14s on GPU via Vulkan than on CPU which is achievable by removing the
-ngl 100
.As of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8 there is a
--parallel
option but not sure what it does.Bibliography:
Articles by others on the same topic
There are currently no matching articles.