The ultimate high level of which is of course to program with:which is basically the goal of artificial general intelligence, especially according to The Employment Test definition of AGI.
The term has not always had that sense:sums it up.
automatic programming has always been a euphemism for programming in a higher-level language than was then available to the programmer
Bibliography:
Blog post: deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
Whitepaper: storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf
Basically they require users to hand-code a metric and provide a program skeleton with some parts of the code marked to be replaced, and then the system focuses on modifying the code regions in question to optimize the metric.
All the novel results they announced were in constraint satisfaction problems or optimization problem. Their results are still awesome, but it's not very different from AlphaGo style things.
Bibliography:
The tests are present in a gzip inside the Git repo: github.com/openai/human-eval/blob/master/data/HumanEval.jsonl.gz These researchers.
To get a quick overview of the problems with jq:
jq -r '"==== \(.task_id) \(.entry_point)\n\(.prompt)"' <HumanEval.jsonl
The first two problems are:so we understand that it takes as input an empty function with a docstring and you have to fill the function body.
==== HumanEval/0 has_close_elements
from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True
"""
==== HumanEval/1 separate_paren_groups
from typing import List
def separate_paren_groups(paren_string: str) -> List[str]:
""" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
separate those group into separate strings and return the list of those.
Separate groups are balanced (each open brace is properly closed) and not nested within each other
Ignore any spaces in the input string.
>>> separate_paren_groups('( ) (( )) (( )( ))')
['()', '(())', '(()())']
"""
The paper also shows that there can be other defined functions besides the one you have to implement.
Their most interesting subset, the
-hard
one, appears to be present at: huggingface.co/datasets/bigcode/bigcodebench-hard in Parquet format. OMG why.The tests make free usage of the Python standard library and other major external libraries, e.g. huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=0 uses FTPlib. Kind of cool.
They even test graph plotting? huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=11 How does it evaluate?
By Princeton people.
This one aims to solve GitHub issues. It appears to contain 2,294 real-world GitHub issues and their corresponding pull requests
The dataset appears to be at: huggingface.co/datasets/princeton-nlp/SWE-bench in Parquet format.
Tasks from Upwork.
Articles by others on the same topic
There are currently no matching articles.