{c}
{tag=Closed source benchmark}
{tag=OpenAI project}
{title2=2024}

https://epoch.ai/frontiermath

Paper: https://arxiv.org/abs/2411.04872

https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:
> The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination
So yeah, fuck off.

The expected answer output for all problems is just one single, possibly ridiculously large, integer, which is kind of a cool approach. Similar to <Project Euler> in that aspect.

The most interesting aspect of this benchmark is the difficulty. <Mathematical olympiad> coach <Evan Chen> comments:https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/{ref}
> Problems in \[the <International Mathematical Olympiad>\] typically require creative insight while avoiding complex implementation and specialized knowledge \[but for <FrontierMath>\] they keep the first requirement, but outright invert the second and third requirement


FrontierMath

{c}
{wiki}

https://projecteuler.net

They don't have an actual online judge system, all problems simply have an integer or floating point solution and they just check that you've found the value.

The only metric that matters is who solved the problem first after publication, e.g.: https://projecteuler.net/fastest=454[]. The "language" in which problems were solved is just whatever the user put in their profile, they can't actually confirm that.

<Project Euler> problems typically involve finding or proving and then using a <lemma (mathematics)> that makes computation of the solution feasible without brute force. As such, they live in the intersection of mathematics and computer science.

List of just the solution values:
* https://www.kaggle.com/datasets/dheerajmpai/projecteuler
* https://github.com/lucky-bai/projecteuler-solutions

Code solutions by individuals:
* https://euler.stephan-brumme.com/
* https://www.nayuki.io/page/project-euler-solutions
* https://www.ivl-projecteuler.com/home
Basically no one ever had the patience to solve them all. What we need is a collaborative solution.

Problems are under <CC BY-NC-SA>: https://projecteuler.net/copyright

How problems are chosen:
* https://matheducators.stackexchange.com/questions/12087/how-does-project-euler-come-up-with-such-good-problems-so-rapidly

https://projecteuler.net says it started as a subsection in mathschallenge.net, and in 2006 moved to its own domain. <WhoisXMLAPI> WHOIS history says it was registered by domainmonster.com but details are anonymous. TODO: sample problem on mathschallenge.net on <Wayback Machine>? Likely wouldn't reveal much anyways though as there is no attribution to problem authors on that site.

https://www.hackerrank.com/contests/projecteuler/challenges holds challenges with an actual judge and sometimes multiple test cases so just printing the final solution number is not enough.


Project Euler

The beauty of <Project Euler> is that it would serve both as a <AI code generation benchmark> and as an <AI Math benchmark>!

Bibliography:
* https://github.com/Orbiter/project-euler-llm-benchmark
* https://www.artfish.ai/p/gpt4-project-euler-many-languages
* https://manifold.markets/MatthewBarnett/will-openais-next-major-llm-after-g-58f667810b11?play=true


Ciro Santilli @cirosantilli 37

 Incoming links: Project Euler