AGI-complete in general? Obviously. But still, a lot can be done. See e.g.:
This section is about benchmarks designed to test mathematical reasoning.
arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:
The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination
So yeah, fuck off.
The expected answer output for all problems is just one single, possibly ridiculously large, integer.

Articles by others on the same topic (1)