AI Math benchmark Created 2025-02-11 Updated 2025-07-16
This section is about benchmarks designed to test mathematical reasoning.
LLM benchmark Created 2025-03-20 Updated 2025-07-16
Benchmarking LLMs is an extremely difficult issue.
Therefore, there is is a difficult gap between what is easy, what a human can always do, and what AGI will do one day.