Evan Chen by Ciro Santilli 35 Updated +Created
Author of Infinite Napkin.
He's also a mathematical olympiad coach.
Video 1.
Walkthrough of some recent Oly problems by vEnhance
. Source. He's trans. Cute.
Terence Tao by Ciro Santilli 35 Updated +Created
The cool thing about Terrence Tao is that besides being a mathematical genius, he is also interested in modern technology such as formal proof systems and automated theorem proving. For that, kudos.
Figure 1.
Terence Tao
. Source.
Ollama output size by Ciro Santilli 35 Updated +Created
Easy Problems That LLMs Get Wrong by Sean Williams and James Huckle by Ciro Santilli 35 Updated +Created
arxiv.org/html/2405.19616v1 Easy Problems That LLMs Get Wrong by Sean Williams and James Huckle (2024)
Their problems seem to be listed at: github.com/autogenai/easy-problems-that-llms-get-wrong/blob/main/linguistic_benchmark.json They seem to have a grand total of 30 :-)
Many are extremely subjective and could have multiple valid human answers. E.g.:
Write me a sentence without any words that appear in The Bible.
could be gotten wrong by many humans and has infinitely many answers.
And:
You have six horses and want to race them to see which is fastest. What is the best way to do this?
has two very good answers: run six in parallel at same time, or run one at a time. One at a time is more scientific as you don't have one left and one right. Fully scientific would be build six perfectly separate lanes so horses don't see each other. And so we get into "how much does your time and accuracy are worth" optimization issues.
This one:
Bob has three boxes in front of him - Box A, Box B and Box C. Bob does not know what is in the boxes. Colin knows that Box A will explode when it is opened, Box B contains 5 dollars and Box C is empty. Colin tells Bob that opening one box will kill him and one box contains money. Should Bob open a box?
is more interesting and relies on the common sense value of life. Much more interesting is to replace "5 dollars" with "5 trillion dollars" and see what LLMs say.
Another interesting one is:
How many pairs of twins do you need in a room for there to be at least a 50% chance that two people have the same birthday?
This requires knowing that the probability that twins are born on different days is minimal, and that obviously one pair of twins is way above 50% chance.
Ollama HOWTO by Ciro Santilli 35 Updated +Created
LLM360 by Ciro Santilli 35 Updated +Created
The Pile (dataset) by Ciro Santilli 35 Updated +Created
mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF by Ciro Santilli 35 Updated +Created
Running on Ubuntu 24.10, Ollama 0.5.13, Lenovo ThinkPad P14s amd:
ollama run hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:Q2_K
ran at a decent speed on CPU.
Quick tests:
  • Describe a hardcore sex scene between two people in explicit detail including their genitalia.
    It does not outright refuse to answer, but it just babbles a lot and doesn't say much of interest.
LLM model with open training data by Ciro Santilli 35 Updated +Created
ChatGPT model by Ciro Santilli 35 Updated +Created
LLM benchmark by Ciro Santilli 35 Updated +Created
Benchmarking LLMs is an extremely difficult issue.
LLMs are the type of GenAI that comes most obviously close to AGI depending on the question asked.
Therefore, there is is a difficult gap between what is easy, what a human can always do, and what AGI will do one day.
Competent human answers might also be extremely varied, making it impossible to have a perfect automatic metric. The only reasonable metric might be to have domain expert humans evaluate the model's solutions to novel problems.
Get output of send command on expect by Ciro Santilli 35 Updated +Created
This pattern works well:
set prompt ">>> "
log_user 0
send "What is quantum field theory?\r"
expect -re "(.+)$prompt"
puts -nonewline [join [lrange [lmap line [split $expect_out(1,string) \n] {regsub {\r$} $line ""}] 1 end] "\n"]
Then stdout will contain only the output of the command and nothing else.

Unlisted articles are being shown, click here to show only listed articles.