BigCodeBench Updated +Created
Their most interesting subset, the -hard one, appears to be present at: huggingface.co/datasets/bigcode/bigcodebench-hard in Parquet format. OMG why.
The tests make free usage of the Python standard library and other major external libraries, e.g. huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=0 uses FTPlib. Kind of cool.
HumanEval Updated +Created
The tests are present in a gzip inside the Git repo: github.com/openai/human-eval/blob/master/data/HumanEval.jsonl.gz these researchers.
To get a quick overview of the problems with jq:
jq -r '"==== \(.task_id) \(.entry_point)\n\(.prompt)"' <HumanEval.jsonl 
The first two problems are:
==== HumanEval/0 has_close_elements
from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """

==== HumanEval/1 separate_paren_groups
from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """
so we understand that it takes as input an empty function with a docstring and you have to fill the function body.
The paper also shows that there can be other defined functions besides the one you have to implement.
Can AI code Updated +Created
Image segmentation Updated +Created
AI code generation framework that tries to run code Updated +Created
  • OpenAI's GPT-4-turbo can generate and run Python code if it detects that the prompt would be better answered by Python, e.g. maths
Fastest gun in the West problem Updated +Created
Reddit toplevel comments are drowned out by comment replies Updated +Created
This is the fatal flaw of Reddit for aQ&A website. If you are not early on replying to the thread, your comment very quickly disappears due to replies to other comments. This greatly amplifies the fastest gun in the West problem.
UVa Online Judge Updated +Created
Project Euler Updated +Created
They don't have an actual online judge system, all problems simply have an integer or floating point solution and they just check that you've found the value.
The only metric that matters is who solved the problem first after publication, e.g.: projecteuler.net/fastest=454. The "language" in which problems were solved is just whatever the user put in their profile, they can't actually confirm that.
Project Euler problems typically involve finding or proving and then using a lemma that makes computation of the solution feasible without brute force. As such, they live in the intersection of mathematics and computer science.
Code solutions by individuals:Basically no one ever had the patience to solve them all. What we need is a collaborative solution.
projecteuler.net says it started as a subsection in mathschallenge.net, and in 2006 moved to its own domain. WhoisXMLAPI WHOIS history says it was registered by domainmonster.com but details are anonymous. TODO: sample problem on mathschallenge.net on Wayback Machine? Likely wouldn't reveal much anyways though as there is no attribution to problem authors on that site.
www.hackerrank.com/contests/projecteuler/challenges holds challenges with an actual judge and sometimes multiple test cases so just printing the final solution number is not enough.
Exercism Updated +Created
Necessary evil Updated +Created
Allen Institute for AI Updated +Created
Programming problem collection website Updated +Created
The Alan Turing Institute Updated +Created
Inria Updated +Created
They do some really fun hardcore mathy stuff over there!
Ciro Santilli interned at Inria Centre at Université Côte d'Azur in the early 2010's. It was a disaster, largely his own fault, but also due to our broken educational system. But they do have awesome things as well.
Applied mathematics research institute Updated +Created
@cirosantilli/_file/lenet Updated +Created
This is a small fork of activatedgeek/LeNet-5 by Ciro Santilli adding better integration and automation for:
Install on Ubuntu 24.10:
sudo apt install protobuf-compiler
cd lenet
virtualenv -p python3 .venv
. .venv/bin/activate
pip install -r requirements-python-3-12.txt
Download and extract MNIST train, test accuracy, and generate the ONNX lenet.onnx:
./train.py
Extract MNIST images as PNG:
./extract_pngs.py
Infer some individual images using the ONNX:
./infer.py data/MNIST/png/test/0/*.png
Draw on a GUI and see live inference using the ONNX:
./draw.py
TODO: the following are missing for this to work:

Unlisted articles are being shown, click here to show only listed articles.