"Autoformalization" refers to automatically converting a traditional human readable mathematical proof to a formal proof.
By Princeton people.
This one aims to solve GitHub issues. It appears to contain 2,294 real-world GitHub issues and their corresponding pull requests
The dataset appears to be at: huggingface.co/datasets/princeton-nlp/SWE-bench in Parquet format.
We define an "Inscription service" as a website that easily allows you to pay to have arbitrary data inscribed on a blockchain.
The most notable early example of such a service on the Bitcoin blockchain was cryptograffiti.info.
We define an "inscription system" as a software or service that facilitates the creation of inscriptions.
Author of Infinite Napkin.
He's also a mathematical olympiad coach.
The cool thing about Terrence Tao is that besides being a mathematical genius, he is also interested in modern technology such as formal proof systems and automated theorem proving. For that, kudos.
TODO: haven't managed.
/set parameter seed 0
: Easy Problems That LLMs Get Wrong by Sean Williams and James Huckle Updated 2025-04-16 +Created 2025-03-20
arxiv.org/html/2405.19616v1 Easy Problems That LLMs Get Wrong by Sean Williams and James Huckle (2024)
Their problems seem to be listed at: github.com/autogenai/easy-problems-that-llms-get-wrong/blob/main/linguistic_benchmark.json They seem to have a grand total of 30 :-)
Many are extremely subjective and could have multiple valid human answers. E.g.:could be gotten wrong by many humans and has infinitely many answers.
Write me a sentence without any words that appear in The Bible.
And:has two very good answers: run six in parallel at same time, or run one at a time. One at a time is more scientific as you don't have one left and one right. Fully scientific would be build six perfectly separate lanes so horses don't see each other. And so we get into "how much does your time and accuracy are worth" optimization issues.
You have six horses and want to race them to see which is fastest. What is the best way to do this?
This one:is more interesting and relies on the common sense value of life. Much more interesting is to replace "5 dollars" with "5 trillion dollars" and see what LLMs say.
Bob has three boxes in front of him - Box A, Box B and Box C. Bob does not know what is in the boxes. Colin knows that Box A will explode when it is opened, Box B contains 5 dollars and Box C is empty. Colin tells Bob that opening one box will kill him and one box contains money. Should Bob open a box?
Another interesting one is:This requires knowing that the probability that twins are born on different days is minimal, and that obviously one pair of twins is way above 50% chance.
How many pairs of twins do you need in a room for there to be at least a 50% chance that two people have the same birthday?
Solutions to some of the problems on specific LLMs can be seen e.g. at: github.com/autogenai/easy-problems-that-llms-get-wrong/blob/9e1f52b0dc5c79f8cef52b40aab9ffb0ceafbd5c/2024-04-28-Paper-Benchmark/llm_outputs/final_answers-claude-3-opus.csv
Unlisted articles are being shown, click here to show only listed articles.