MathArena

This project tests various models against various competitions.

How they "ensure" that models are not contaminated:

By evaluating models as soon as new problems are released, we effectively eliminate the risk of contamination

Most of their problems come from high school knowledge olympiads and they are therefore completely irrelevant for 2025 LLMs.

MathArena Apex

A subsets of problems that they curate from competitions.

There are currently no matching articles.