ollama-expect Created 2025-05-21 Updated 2025-07-19
Usage:
./ollama-expect <model> <prompt>
e.g.:
./ollama-expect llama3.2 'What is quantum field theory?'
This generates 100 tokens for the given prompt with the given model.
Benchmarks:
llama-cli Created 2025-07-16 Updated 2025-08-08
A CLI front-end for llama.cpp.
A decent test command as of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8:
time ./llama-cli \
  --no-display-prompt \
  --single-turn \
  --temp 0 \
  -c 16384 \
  -cnv \
  -m Llama-3.1-Tulu-3-8B-Q8_0.gguf \
  -n 1000 \
  -ngl 100 \
  -p 'What is quantum field theory?' \
  -t 10 |
  tee output.txt \
;
but it failed to be deterministic despite --temperature 0. This ran 2x faster at 18 tokens/s for 1000 tokens on P14s on GPU via Vulkan than on CPU which is achievable by removing the -ngl 100.