{c}
{tag=Closed source benchmark}
{tag=OpenAI project}
{title2=2024}

https://epoch.ai/frontiermath

Paper: https://arxiv.org/abs/2411.04872

https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:
> The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination
So yeah, fuck off.

The expected answer output for all problems is just one single, possibly ridiculously large, integer, which is kind of a cool approach. Similar to <Project Euler> in that aspect.

The most interesting aspect of this benchmark is the difficulty. <Mathematical olympiad> coach <Evan Chen> comments:https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/{ref}
> Problems in \[the <International Mathematical Olympiad>\] typically require creative insight while avoiding complex implementation and specialized knowledge \[but for <FrontierMath>\] they keep the first requirement, but outright invert the second and third requirement


 FrontierMath

ID: frontiermath