Ciro Santilli @cirosantilli 37

 Articles (11k) Discussions (27) Comments (64) Follows  Received likes Files

New Updated  Top  Announced  A-Z  Liked  Followed

Ciro Santilli 37 2025-12-02

Questions available to anyone under Hugging Face login / .zip with password, but you have to promise not to post them online. Lol. Either do the thing or don't.

 Read the full article

Project Euler problem 958 by

Ciro Santilli 37 2025-12-02

 View more

projecteuler.net/problem=958

Numerical solution:

367554579311

Earliest known public leak: github.com/lucky-bai/projecteuler-solutions/issues/93

Programs:

euler/958.py

 Read the full article

LiveBench by

Ciro Santilli 37 2025-12-02

 View more

livebench.ai

Math almost saturated as of 2025 release, so meh:

modified questions based on high school math competitions from the past 11 months, as well as harder versions of AMPS questions

 Read the full article

Project Euler problem 948 by

Ciro Santilli 37 Created 2025-12-01 Updated 2025-12-02

 View more

projecteuler.net/problem=948

Numerical solution:

1033654680825334184

Earliest known public leak: github.com/lucky-bai/projecteuler-solutions/issues/87

Programs:

euler/948.py

 Read the full article

Poetiq by

Ciro Santilli 37 2025-12-01

 View more

poetiq.ai/

In 2025 they announced huge improvements on ARC-AGI-2, but they only tested on the public dataset, so the potential for contamination is overwhelming.

 Read the full article

Ubuntu 25.10 bug by

Ciro Santilli 37 2025-11-30

 View more

askubuntu.com/questions/1560258/how-to-prevent-ubuntu-25-10-from-waking-up-from-suspend-when-i-close-the-laptop?noredirect=1&lq=1

 Read the full article

Ubuntu 25.10 by

Ciro Santilli 37 2025-11-30

 Read the full article

Project Euler problem 972 by

Ciro Santilli 37 2025-11-30

 View more

projecteuler.net/problem=972

Numerical solution:

Earliest known public leak:

Programs:

euler/972.py

 Read the full article

AI Mathematical Olympiad by

Ciro Santilli 37 2025-11-30

 View more

aimoprize.com

Not too exciting because of the high school knowledge olympiad level, but respectable.

www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-3/overview is round 3.
Every problem has one final integer answer:
In this competition, every ground-truth label is an integer between 0 and 99999
Non-integer results like square roots are just rounded off to produce an integer they mention:
$1 0^{4} 2 = 14142$
Also unlike Project Euler and like IMO, all only limited computations are required, i.e. you are not expected to do full blown program generation to reach a final answer. Which makes this further less exciting.

 Read the full article

Physics Derivation Graph by

Ciro Santilli 37 2025-11-30

 View more

 Read the full article

PhysLean by

Ciro Santilli 37 2025-11-30

 View more

physlean.com/

 Read the full article

Formalization of physics project by

Ciro Santilli 37 2025-11-30

 Read the full article

Formalization of physics by

Ciro Santilli 37 2025-11-30

 Read the full article

Formalization of X by

Ciro Santilli 37 2025-11-30

 View more

This section is about formalization efforts of specific fields of mathematics.

 Read the full article

Project Euler problem 971 by

Ciro Santilli 37 2025-11-23

 View more

projecteuler.net/problem=971

Numerical solution:

33626723890930

Earliest known public leak:

Programs:

euler/971.py

 Read the full article

Project Euler problem 970 by

Ciro Santilli 37 2025-11-19

 View more

projecteuler.net/problem=970

Numerical solution:

44754029

Earliest known public leak: x.com/cirosantilli/status/1990363555309490585

Programs:

euler/970.py

 Read the full article

ORCA Benchmark by

Ciro Santilli 37 Created 2025-11-19 Updated 2025-11-30

 View more

arxiv.org/abs/2511.02589

This one doesn't seem to exciting to be honest, but it might be useful. Sample question:

If I deposit $50,000 at 5% APR, compounded weekly, what will my balance be after 18 months?

and it expects the correct answer down to the cents:

53892.27

It should be noted that Project Euler has such "precision matters" problems.

 Read the full article

Closed AI math benchmark by

Ciro Santilli 37 Created 2025-11-19 Updated 2025-11-30

 View more

Even more than in other areas of benchmarking, in maths where you only have a right or wrong answer, and it is costly to come up with good sample problems, some benchmarks have adopted private test data sets.

The situation is kind of sad, in that ideally we should have open data sets and only test them on models that were trained on data exclusively published before the problem publish date.

However this is not practical for the following reasons:

some of the best models are closed source and don't have a reproducible training with specified cutoff
having a private test set allows you to automatically check answers from untrusted sources. If they get answers right, they are onto something, you don't even need to check their methodology

Perhaps the ideal scenario therefore is what ARC-AGI has done: give a sizeable public dataset, which you feel is highly representative of the difficulty level of the private test data, while at the same time holding out some private test data. Half half seems reasonable.

This way, reproducible models can actually self test themselves reliably on the open data, while the closed data can then be used for the cases where the open data can't be used.

 Read the full article

List of math AI benchmarks by

Ciro Santilli 37 2025-11-19

 Read the full article

3D-printed firearm by

Ciro Santilli 37 2025-11-18

 View more

Video 1.

3D Printed Guns Are Easy To Make And Impossible To Stop by VICE News (2018)

Source.

 Read the full article

 Unlisted articles are being shown, click here to show only listed articles.