Their most interesting subset, the
-hard
one, appears to be present at: huggingface.co/datasets/bigcode/bigcodebench-hard in Parquet format. OMG why.The tests make free usage of the Python standard library and other major external libraries, e.g. huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=0 uses FTPlib. Kind of cool.
They even test graph plotting? huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=11 How does it evaluate?
Articles by others on the same topic
There are currently no matching articles.