Apparently also has human review as part of the process. Newbs. Just require Lean solutions and be done with it... They do address it in a section of the paper "Formal math benchmarks" but still meh. Review must be fully automated, none of that asking humans bullshit.
Required Characteristics
PhD-level difficulty: Suitable for qualifying exams, research papers, or advanced seminars
Requires genuine insight: Not solvable by routine application of known algorithms
Clear proof-based main question: Answer should be a complete mathematical argument, not just a number
2-3 unique-answer subquestions: Enable automated evaluation (e.g., "Is the statement true for n=5?", "What is the rank of this group?")
Example problem:
Example 1: Stable Graphs
Main question: Find a closed formula for the number of stable graphs of genus with no legs and precisely 3 edges, for all .
Subquestions:
  • What is ?
  • What is ?
  • What is ?

Articles by others on the same topic (0)

There are currently no matching articles.