{file}

Usage:
``
./ollama-expect <model> <prompt>
``
e.g.:
``
./ollama-expect llama3.2 'What is quantum field theory?'
``
This generates 100 tokens for the given prompt with the given model.

Benchmarks:
* <Ciro Santilli's hardware/P14s>: 4.8s, CPU only: ~21 tokens / s. For comparison, using the <Vulkan> backend of <llama.cpp> gave ~23  tokens/s
* <Ciro Santilli's hardware/P51>: 9.6s, uses <Nvidia> GPU: ~10 tokens / s


ollama-expect

{c}

https://ollama.com

This appears to be the backend library of <Ollama>.

They have a <CLI> front-end named <llama-cli>.

https://askubuntu.com/questions/1461564/install-llama-cpp-locally has some tutorials for <Ubuntu>. There was no nicely pre-packaged one for <Ubuntu 25.04>, but build worked on 79e0b68c178656bb0632cb8602d2940b755077f8 In particular it exposed <Vulkan> support before <Ollama> did: https://github.com/ollama/ollama/pull/5059 and it did seem to work, using up my <AMD GPU>.


Ciro Santilli @cirosantilli 37

 Incoming links: Vulkan