Paper: arxiv.org/html/2509.26076v1
Apparently also has human review as part of the process. Newbs. Just require Lean solutions and be done with it... They do address it in a section of the paper "Formal math benchmarks" but still meh. Review must be fully automated, none of that asking humans bullshit.
Required CharacteristicsRequires genuine insight: Not solvable by routine application of known algorithms
Example problem:
Numerical solution:
13313751171933973557517973175Earliest known public leak:
Numerical solution:
427278142Earliest known public leak:
Notes:
- matharena.ai/?comp=euler--euler from MathArena claims Gemini 3 solved it
- x.com/roanoke_gal/status/1997322744594125081 claims GPT-5.1 Pro solved it.
Numerical solution:
3575508Earliest known public leak:
Numerical solution:
33626723890930Earliest known public leak:
Numerical solution:
44754029Earliest known public leak: x.com/cirosantilli/status/1990363555309490585
Unlisted articles are being shown, click here to show only listed articles.