Ciro Santilli @cirosantilli 37

 Tagged: OpenAI project

FrontierMath Created 2025-02-11 Updated 2025-07-16

arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:

The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination

So yeah, fuck off.

The expected answer output for all problems is just one single, possibly ridiculously large, integer, which is kind of a cool approach. Similar to Project Euler in that aspect.

The most interesting aspect of this benchmark is the difficulty. Mathematical olympiad coach Evan Chen comments:^[ref]

Problems in [the International Mathematical Olympiad] typically require creative insight while avoiding complex implementation and specialized knowledge [but for FrontierMath] they keep the first requirement, but outright invert the second and third requirement

 Read the full article

HumanEval Created 2025-03-20 Updated 2025-07-16

 View more

The tests are present in a gzip inside the Git repo: github.com/openai/human-eval/blob/master/data/HumanEval.jsonl.gz These researchers.

To get a quick overview of the problems with jq:

jq -r '"==== \(.task_id) \(.entry_point)\n\(.prompt)"' <HumanEval.jsonl

The first two problems are:

==== HumanEval/0 has_close_elements
from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """

==== HumanEval/1 separate_paren_groups
from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """

so we understand that it takes as input an empty function with a docstring and you have to fill the function body.

The paper also shows that there can be other defined functions besides the one you have to implement.

 Read the full article

OpenAi Whisper Created 2024-08-15 Updated 2025-07-16

 Read the full article

SWE-Lancer Created 2025-03-20 Updated 2025-07-16

 View more

Tasks from Upwork.

 Read the full article