FrontierMath by Ciro Santilli 35 Updated +Created
arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:
The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination
So yeah, fuck off.

New to topics? Read the docs here!