AGI-complete in general? Obviously. But still, a lot can be done. See e.g.:
- The Busy Beaver Challenge deciders
This section is about benchmarks designed to test mathematical reasoning.
arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:So yeah, fuck off.
The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination
The expected answer output for all problems is just one single, possibly ridiculously large, integer.