{c}
{title2=2024}

* https://github.com/bigcode-project/bigcodebench
* https://bigcode-bench.github.io/
* https://arxiv.org/abs/2406.15877

Their most interesting subset, the `-hard` one, appears to be present at: https://huggingface.co/datasets/bigcode/bigcodebench-hard in Parquet format. OMG why.

The tests make free usage of the <Python standard library> and other major external libraries, e.g. https://huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=0 uses FTPlib. Kind of cool.

They even test graph plotting? https://huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=11 How does it evaluate?


 BigCodeBench (source code)