Closed AI math benchmark Created 2025-11-19 Updated 2025-12-13
Even more than in other areas of benchmarking, in maths where you only have a right or wrong answer, and it is costly to come up with good sample problems, some benchmarks have adopted private test data sets.
The situation is kind of sad, in that ideally we should have open data sets and only test them on models that were trained on data exclusively published before the problem publish date.
However this is not practical for the following reasons:
Giotto.ai Created 2025-04-24 Updated 2025-07-16
www.giotto.ai/Their website doesn't clearly explain their technology as of 2025.
At Giotto.ai, our technology is designed to bridge the gap between current AI capabilities and the promise of Artificial General Intelligence (AGI).
They claim to have done some work on ARC-AGI which is cool, but no clear references to what they did or if there's anything public about it.
NDEA Created 2025-03-28 Updated 2025-07-16
Unofficial ARC-AGI problem set 2025-12-13
This section is about unofficial ARC-AGI-like problem sets.
These are interesting from both a:
github.com/neoneye/arc-dataset-collection contains a fantastic collection of such datasets, with visualization at: neoneye.github.io/arc/
Updates ARC-AGI-2 Created 2025-10-18 Updated 2025-10-21
I've created a quick fork of ARC-DSL which defines a hand crafted Domain Specific Language (DSL) approach to help solve ARC-AGI problems.
I basically just merged outstanding pull requests on the original repo that were needed to make things run.
It would be cool to see if those rules also solve ARC-AGI-2 problems well, but lazy now.
ARC-AGI-2 is a very interesting benchmark which mixes some symbolic and other visual elements, and is readily solvable by non-expert humans, but has so far resisted transformers to a large degree.
Part of me would like to focus more on less visual aspects of AI, but it is still of interest.
It is funny how many early (semi)-retired fintech/bigtech bros that are interested in the project, I saw several of them on the forums.
I'd be tempted if I were in that position too I must confess. Maybe in 15 years time for me the way things are looking.
Kudos to these people who do something cool and open when they don't need money: www.reddit.com/r/Fire/comments/15x4w7r/comment/jx7dn16/ It is also the case of Jimmy Wales from Wikipedia for example, who used to work in finance.