FrontierMath Updated +Created
arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:
The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination
So yeah, fuck off.