Paper: arxiv.org/html/2509.26076v1
Apparently also has human review as part of the process. Newbs. Just require Lean solutions and be done with it... They do address it in a section of the paper "Formal math benchmarks" but still meh. Review must be fully automated, none of that asking humans bullshit.
Required CharacteristicsRequires genuine insight: Not solvable by routine application of known algorithms
Example problem:
Articles by others on the same topic
There are currently no matching articles.