Source: cirosantilli/ollama

= Ollama
{c}
{tag=Good}

https://github.com/jmorganca/ollama

<Ollama> is a highly automated open source wrapper that makes it very easy to run multiple <Open weight LLM models> either on <CPU> or <GPU>.

Its README alone is of great value, serving as a fantastic list of the most popular <Open weight LLM models> in existence.

Install with:
``
curl https://ollama.ai/install.sh | sh
``

The below was tested on Ollama 0.1.14 from December 2013.

Download <llama2 7B> and open a prompt:
``
ollama run llama2
``

On <Ciro Santilli's Hardware/P14s> it runs on <CPU> and generates a few tokens per second, which is quite usable for a quick interactive play.

As mentioned at https://github.com/jmorganca/ollama/blob/0174665d0e7dcdd8c60390ab2dd07155ef84eb3f/docs/faq.md the downloads to under `/usr/share/ollama/.ollama/models/` and <ncdu> tells me:
``
--- /usr/share/ollama ----------------------------------
    3.6 GiB [###########################] /.ollama
    4.0 KiB [                           ]  .bashrc
    4.0 KiB [                           ]  .profile
    4.0 KiB [                           ]  .bash_logout
``

We can also do it non-interactively with:
``
/bin/time ollama run llama2 'What is quantum field theory?'
``
which gave me:
``
0.13user 0.17system 2:06.32elapsed 0%CPU (0avgtext+0avgdata 17280maxresident)k
0inputs+0outputs (0major+2203minor)pagefaults 0swaps
``
but note that there is a random seed that affects each run by default. This was apparently fixed however: https://github.com/ollama/ollama/issues/2773[], but <Ciro Santilli> doesn't know how to set the seed.

Some other quick benchmarks from <Amazon EC2 GPU> on a <g4nd.xlarge> instance which had an <Nvidia Tesla T4>:
``
0.07user 0.05system 0:16.91elapsed 0%CPU (0avgtext+0avgdata 16896maxresident)k
0inputs+0outputs (0major+1960minor)pagefaults 0swaps
``
and on <Nvidia A10G> in an <g5.xlarge> instance:
``
0.03user 0.05system 0:09.59elapsed 0%CPU (0avgtext+0avgdata 17312maxresident)k
8inputs+0outputs (1major+1934minor)pagefaults 0swaps
``

So it's not too bad, a small article in 10s.

It tends to babble quite a lot by default, but eventually decides to stop.