Uncensored LLM Created 2025-03-20 Updated 2025-07-16
LLM benchmark Created 2025-03-20 Updated 2025-07-16
Benchmarking LLMs is an extremely difficult issue.
Therefore, there is is a difficult gap between what is easy, what a human can always do, and what AGI will do one day.
Get output of
send
command on expect Created 2025-03-20 Updated 2025-07-16This pattern works well:Then stdout will contain only the output of the command and nothing else.
set prompt ">>> "
log_user 0
send "What is quantum field theory?\r"
expect -re "(.+)$prompt"
puts -nonewline [join [lrange [lmap line [split $expect_out(1,string) \n] {regsub {\r$} $line ""}] 1 end] "\n"]
Bibliography:
- unix.stackexchange.com/questions/239161/get-the-output-from-expect-script-in-a-variable/792645#792645
- stackoverflow.com/questions/45210358/expect-output-only-stdout-of-the-command-and-nothing-else/79517903#79517903
- stackoverflow.com/questions/57975853/how-to-read-the-send-command-output-in-expect-script title is wrong, OP wants exit status apparently not stdout
RetinaNet Created 2025-03-20 Updated 2025-07-16
You Only Look Once Created 2025-03-20 Updated 2025-07-16
You can get some really sweet pre-trained versions of this, typically trained on the COCO dataset.
AlexNet Created 2025-03-20 Updated 2025-07-16
Expect HOWTO Created 2025-03-20 Updated 2025-07-16
Expect Created 2025-03-20 Updated 2025-07-16
List of convolutional neural networks Created 2025-03-20 Updated 2025-07-16
Chromium sometimes freezes due to autofill on omnibox Created 2025-03-20 Updated 2025-07-16
This has happened a few times a day on Ubuntu 24.10 and Chromium 133. It has also been happening in previous versions of Ubuntu and Chromium.
As Ciro Santilli starts typing on the omnibox, sometimes the window freezes and the dreaded "is not responding" window shows up.
The only somewhat similar reports that Ciro Santilli could find as of 2025:
Value of life Created 2025-03-20 Updated 2025-07-16
Chromium bug Created 2025-03-20 Updated 2025-07-16
SWE-Lancer Created 2025-03-20 Updated 2025-07-16
BigCodeBench Created 2025-03-20 Updated 2025-07-16
Their most interesting subset, the
-hard
one, appears to be present at: huggingface.co/datasets/bigcode/bigcodebench-hard in Parquet format. OMG why.The tests make free usage of the Python standard library and other major external libraries, e.g. huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=0 uses FTPlib. Kind of cool.
They even test graph plotting? huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=11 How does it evaluate?
HumanEval Created 2025-03-20 Updated 2025-07-16
The tests are present in a gzip inside the Git repo: github.com/openai/human-eval/blob/master/data/HumanEval.jsonl.gz These researchers.
To get a quick overview of the problems with jq:
jq -r '"==== \(.task_id) \(.entry_point)\n\(.prompt)"' <HumanEval.jsonl
The first two problems are:so we understand that it takes as input an empty function with a docstring and you have to fill the function body.
==== HumanEval/0 has_close_elements
from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True
"""
==== HumanEval/1 separate_paren_groups
from typing import List
def separate_paren_groups(paren_string: str) -> List[str]:
""" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
separate those group into separate strings and return the list of those.
Separate groups are balanced (each open brace is properly closed) and not nested within each other
Ignore any spaces in the input string.
>>> separate_paren_groups('( ) (( )) (( )( ))')
['()', '(())', '(()())']
"""
The paper also shows that there can be other defined functions besides the one you have to implement.
Can AI code Created 2025-03-20 Updated 2025-07-16
Image segmentation Created 2025-03-20 Updated 2025-07-16
AI code generation benchmark Created 2025-03-20 Updated 2025-07-16
AI code generation framework that tries to run code Created 2025-03-20 Updated 2025-07-16
Fastest gun in the West problem Created 2025-03-20 Updated 2025-07-16
Unlisted articles are being shown, click here to show only listed articles.