IMProofBench 2026-03-05
Paper: arxiv.org/html/2509.26076v1
Apparently also has human review as part of the process. Newbs. Just require Lean solutions and be done with it... They do address it in a section of the paper "Formal math benchmarks" but still meh. Review must be fully automated, none of that asking humans bullshit.
Required CharacteristicsRequires genuine insight: Not solvable by routine application of known algorithms
Example problem:
First Proof 2026-03-05
Incubator economy 2026-03-05
DARPA project 2026-02-08
Project Euler pen and paper solutions 2026-02-08
Project Euler problem 99 solution 2026-02-08
Project Euler problem 982 solution 2026-02-08
Project Euler problem 981 solution 2026-02-08
Project Euler problem 980 solution 2026-02-08
Project Euler problem 98 solution 2026-02-08
Project Euler problem 979 solution 2026-02-08
Project Euler problem 978 solution 2026-02-08
Project Euler problem 977 solution 2026-02-08
Project Euler problem 976 solution 2026-02-08
Project Euler problem 975 solution 2026-02-08
Project Euler problem 974 solution 2026-02-08
Numerical solution:
13313751171933973557517973175Earliest known public leak:
Project Euler problem 973 solution 2026-02-08
Numerical solution:
427278142Earliest known public leak:
Notes:
- matharena.ai/?comp=euler--euler from MathArena claims Gemini 3 solved it
- x.com/roanoke_gal/status/1997322744594125081 claims GPT-5.1 Pro solved it.
Project Euler problem 972 solution 2026-02-08
Numerical solution:
3575508Earliest known public leak:
Project Euler problem 971 solution 2026-02-08
Numerical solution:
33626723890930Earliest known public leak:
Project Euler problem 970 solution 2026-02-08
Numerical solution:
44754029Earliest known public leak: x.com/cirosantilli/status/1990363555309490585
There are unlisted articles, also show them or only show them.