Their most interesting subset, the
-hard
one, appears to be present at: huggingface.co/datasets/bigcode/bigcodebench-hard in Parquet format. OMG why.The tests make free usage of the Python standard library and other major external libraries, e.g. huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=0 uses FTPlib. Kind of cool.
They even test graph plotting? huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=11 How does it evaluate?
The tests are present in a gzip inside the Git repo: github.com/openai/human-eval/blob/master/data/HumanEval.jsonl.gz these researchers.
To get a quick overview of the problems with jq:
jq -r '"==== \(.task_id) \(.entry_point)\n\(.prompt)"' <HumanEval.jsonl
The first two problems are:so we understand that it takes as input an empty function with a docstring and you have to fill the function body.
==== HumanEval/0 has_close_elements
from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True
"""
==== HumanEval/1 separate_paren_groups
from typing import List
def separate_paren_groups(paren_string: str) -> List[str]:
""" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
separate those group into separate strings and return the list of those.
Separate groups are balanced (each open brace is properly closed) and not nested within each other
Ignore any spaces in the input string.
>>> separate_paren_groups('( ) (( )) (( )( ))')
['()', '(())', '(()())']
"""
The paper also shows that there can be other defined functions besides the one you have to implement.
Appears to be a very small number of newly created problems?
- OpenAI's GPT-4-turbo can generate and run Python code if it detects that the prompt would be better answered by Python, e.g. maths
Reddit toplevel comments are drowned out by comment replies Updated 2025-04-16 +Created 2025-03-20
This is the fatal flaw of Reddit for aQ&A website. If you are not early on replying to the thread, your comment very quickly disappears due to replies to other comments. This greatly amplifies the fastest gun in the West problem.
They don't have an actual online judge system, all problems simply have an integer or floating point solution and they just check that you've found the value.
The only metric that matters is who solved the problem first after publication, e.g.: projecteuler.net/fastest=454. The "language" in which problems were solved is just whatever the user put in their profile, they can't actually confirm that.
Project Euler problems typically involve finding or proving and then using a lemma that makes computation of the solution feasible without brute force. As such, they live in the intersection of mathematics and computer science.
List of just the solution values:
Code solutions by individuals:Basically no one ever had the patience to solve them all. What we need is a collaborative solution.
Problems are under CC BY-NC-SA: projecteuler.net/copyright
How problems are chosen:
projecteuler.net says it started as a subsection in mathschallenge.net, and in 2006 moved to its own domain. WhoisXMLAPI WHOIS history says it was registered by domainmonster.com but details are anonymous. TODO: sample problem on mathschallenge.net on Wayback Machine? Likely wouldn't reveal much anyways though as there is no attribution to problem authors on that site.
www.hackerrank.com/contests/projecteuler/challenges holds challenges with an actual judge and sometimes multiple test cases so just printing the final solution number is not enough.
They do some really fun hardcore mathy stuff over there!
Ciro Santilli interned at Inria Centre at Université Côte d'Azur in the early 2010's. It was a disaster, largely his own fault, but also due to our broken educational system. But they do have awesome things as well.
This is a small fork of activatedgeek/LeNet-5 by Ciro Santilli adding better integration and automation for:
- extracting MNIST images as PNG
- ONNX CLI inference taking any image files as input
- a Python
tkinter
GUI that lets you draw and see inference live - running on GPU
Install on Ubuntu 24.10:
sudo apt install protobuf-compiler
cd lenet
virtualenv -p python3 .venv
. .venv/bin/activate
pip install -r requirements-python-3-12.txt
Download and extract MNIST train, test accuracy, and generate the ONNX Extract MNIST images as PNG:Infer some individual images using the ONNX:Draw on a GUI and see live inference using the ONNX:TODO: the following are missing for this to work:
lenet.onnx
:./train.py
./extract_pngs.py
./infer.py data/MNIST/png/test/0/*.png
./draw.py
- start a background task. This we know how to do: stackoverflow.com/questions/1198262/tkinter-locks-python-when-an-icon-is-loaded-and-tk-mainloop-is-in-a-thread/79502287#79502287
- get bytes from the canvas: all methods are ugly: stackoverflow.com/questions/9886274/how-can-i-convert-canvas-content-to-an-image
Unlisted articles are being shown, click here to show only listed articles.