AI Mathematical Olympiad 2025-11-30
Not too exciting because of the high school knowledge olympiad level, but respectable.
- Every problem has one final integer answer:Also unlike Project Euler and like IMO, all only limited computations are required, i.e. you are not expected to do full blown program generation to reach a final answer. Which makes this further less exciting.
FrontierMath Created 2025-02-11 Updated 2025-11-21
Paper: arxiv.org/abs/2411.04872
arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:
The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination
The expected answer output for all problems is one single SymPy expression, which is kind of a cool approach which allows either for large integers like Project Euler, but also for irrational expressions to be given, e.g. "An optimization problem in BMO space" from the sample problems has answer:Of course, when the output is not an integer, this leads to the question of simplification equivalence questions. Also, like Project Euler, solutions essentially expect you to write and execute code.
The most interesting aspect of this benchmark is the difficulty. Mathematical olympiad coach Evan Chen comments:[ref]
Problems in [the International Mathematical Olympiad] typically require creative insight while avoiding complex implementation and specialized knowledge [but for FrontierMath] they keep the first requirement, but outright invert the second and third requirement
ORCA Benchmark Created 2025-11-19 Updated 2025-11-30
This one doesn't seem to exciting to be honest, but it might be useful. Sample question:and it expects the correct answer down to the cents:It should be noted that Project Euler has such "precision matters" problems.
53892.27
Project Euler Created 2025-03-20 Updated 2025-10-14
They don't have an actual online judge system, all problems simply have an integer or floating point solution and they just check that you've found the value.
The only metric that matters is who solved the problem first after publication, e.g.: projecteuler.net/fastest=454. The "language" in which problems were solved is just whatever the user put in their profile, they can't actually confirm that.
Project Euler problems typically involve finding or proving and then using a lemma that makes computation of the solution feasible without brute force. As such, they live in the intersection of mathematics and computer science.
List of just the solution values:
Code solutions by individuals:Basically no one ever had the patience to solve them all. What we need is a collaborative solution.
Problems are under CC BY-NC-SA: projecteuler.net/copyright
Once you solve a problem, you can then access its "private" forum thread: projecteuler.net/thread=950 and people will post a bunch of code solutions in there.
How problems are chosen:
projecteuler.net says it started as a subsection in mathschallenge.net, and in 2006 moved to its own domain. WhoisXMLAPI WHOIS history says it was registered by domainmonster.com but details are anonymous. TODO: sample problem on mathschallenge.net on Wayback Machine? Likely wouldn't reveal much anyways though as there is no attribution to problem authors on that site.
www.hackerrank.com/contests/projecteuler/challenges holds challenges with an actual judge and sometimes multiple test cases so just printing the final solution number is not enough.
Project Euler as an AI benchmark Created 2025-03-24 Updated 2025-10-14
The beauty of Project Euler is that it would serve both as a AI code generation benchmark and as an AI Math benchmark!
Updates Getting banned from Project Euler Created 2025-10-27 Updated 2025-11-05
I have been banned from Project Euler for life, and cannot login to my previous account projecteuler.net/profile/cirosantilli.pn
The ban happened within 12 hours of me publishing a solution to Project Euler problem 961 github.com/lucky-bai/projecteuler-solutions/pull/94 which was one-shot by a free GPT-5 account as MathArena had alerted me to being possible: matharena.ai/?comp=euler--euler&task=4&model=GPT-5+%28high%29&run=1
The problem leaderboard contains several people solved the problem within minutes of it being released, so almost certainly with an LLM.
The "secret club" mentality is their only blemish, and incompatible with open science.
They should also make sure that LLMs don't one shot their future problems BEFORE publishing them!