IMProofBench (source code)

= IMProofBench
{c}
{tag=Closed source benchmark}
{title2=2025}

https://improofbench.math.ethz.ch/

Paper: https://arxiv.org/html/2509.26076v1

Apparently also has human review as part of the process. Newbs. Just require Lean solutions and be done with it... They do address it in a section of the paper "Formal math benchmarks" but still meh. Review must be fully automated, none of that asking humans bullshit.

From: https://improofbench.math.ethz.ch/guidelines/

> Required Characteristics

  PhD-level difficulty: Suitable for qualifying exams, research papers, or advanced seminars

  Requires genuine insight: Not solvable by routine application of known algorithms

  Clear proof-based main question: Answer should be a complete mathematical argument, not just a number

  2-3 unique-answer subquestions: Enable automated evaluation (e.g., "Is the statement true for n=5?", "What is the rank of this group?")

Example problem:

> Example 1: Stable Graphs

  Main question: Find a closed formula for the number $N(g)$ of stable graphs of genus $g$ with no legs and precisely 3 edges, for all $g \ge 2$.

  Subquestions:

  * What is $N(3)$?
  * What is $N(8)$?
  * What is $N(1000)$?