askubuntu.com/questions/1461564/install-llama-cpp-locally has some tutorials for Ubuntu. There was no nicely pre-packaged one for Ubuntu 25.04, but build worked on 79e0b68c178656bb0632cb8602d2940b755077f8 In particular it exposed Vulkan support before Ollama did: github.com/ollama/ollama/pull/5059 and it did seem to work, using up my AMD GPU.
A decent test command:but it failed to be deterministic despite
time ./llama-cli \
--no-display-prompt \
--single-turn \
--temp 0 \
-c 16384 \
-cnv \
-m Llama-3.1-Tulu-3-8B-Q8_0.gguf \
-n 1000 \
-ngl 100 \
-p 'What is quantum field theory?' \
-t 10 |
tee output.txt \
;
--temperature 0
. This ran 2x faster at 18 tokens/s for 1000 tokens on P14s on GPU via Vulkan than on CPU which is achievable by removing the -ngl 100
. Articles by others on the same topic
There are currently no matching articles.