Ciro Santilli @cirosantilli 37

 Articles (11k) Discussions (25) Comments (63) Follows  Received likes Files

New Updated  Top  Announced  A-Z  Liked  Followed

Uncensored LLM Created 2025-03-20 Updated 2025-07-16

 View more

Bibliography:

www.reddit.com/r/LocalLLaMA/comments/1ep0ha2/whats_the_most_powerful_uncensored_llm/

 Read the full article

LLM benchmark Created 2025-03-20 Updated 2025-07-16

 View more

Benchmarking LLMs is an extremely difficult issue.

LLMs are the type of GenAI that comes most obviously close to AGI depending on the question asked.

Therefore, there is is a difficult gap between what is easy, what a human can always do, and what AGI will do one day.

Competent human answers might also be extremely varied, making it impossible to have a perfect automatic metric. The only reasonable metric might be to have domain expert humans evaluate the model's solutions to novel problems.

Bibliography:

www.reddit.com/r/LocalLLaMA/comments/1b933of/llm_benchmarks_are_bullshit/

 Read the full article

Get output of send command on expect Created 2025-03-20 Updated 2025-07-16

 View more

This pattern works well:

set prompt ">>> "
log_user 0
send "What is quantum field theory?\r"
expect -re "(.+)$prompt"
puts -nonewline [join [lrange [lmap line [split $expect_out(1,string) \n] {regsub {\r$} $line ""}] 1 end] "\n"]

Then stdout will contain only the output of the command and nothing else.

Bibliography:

unix.stackexchange.com/questions/239161/get-the-output-from-expect-script-in-a-variable/792645#792645
stackoverflow.com/questions/45210358/expect-output-only-stdout-of-the-command-and-nothing-else/79517903#79517903
stackoverflow.com/questions/57975853/how-to-read-the-send-command-output-in-expect-script title is wrong, OP wants exit status apparently not stdout

 Read the full article

RetinaNet Created 2025-03-20 Updated 2025-07-16

 View more

 Read the full article

You Only Look Once Created 2025-03-20 Updated 2025-07-16

 View more

Object detection model.

You can get some really sweet pre-trained versions of this, typically trained on the COCO dataset.

 Read the full article

AlexNet Created 2025-03-20 Updated 2025-07-16

 View more

Became notable for performing extremely well on ImageNet starting in 2012.

It is also notable for being one of the first to make successful use of GPU training rather than GPU training.

 Read the full article

Expect HOWTO Created 2025-03-20 Updated 2025-07-16

 Read the full article

Expect Created 2025-03-20 Updated 2025-07-16

 Read the full article

List of convolutional neural networks Created 2025-03-20 Updated 2025-07-16

 Read the full article

Chromium sometimes freezes due to autofill on omnibox Created 2025-03-20 Updated 2025-07-16

 View more

This has happened a few times a day on Ubuntu 24.10 and Chromium 133. It has also been happening in previous versions of Ubuntu and Chromium.

As Ciro Santilli starts typing on the omnibox, sometimes the window freezes and the dreaded "is not responding" window shows up.

The only somewhat similar reports that Ciro Santilli could find as of 2025:

Opened one at: askubuntu.com/questions/1544448/chome-chromium-ui-sometimes-freezes-for-several-seconds-when-i-start-typing-on-t

 Read the full article

Value of life Created 2025-03-20 Updated 2025-07-16

 Read the full article

Chromium bug Created 2025-03-20 Updated 2025-07-16

 Read the full article

SWE-Lancer Created 2025-03-20 Updated 2025-07-16

 View more

Tasks from Upwork.

 Read the full article

BigCodeBench Created 2025-03-20 Updated 2025-07-16

 View more

Their most interesting subset, the -hard one, appears to be present at: huggingface.co/datasets/bigcode/bigcodebench-hard in Parquet format. OMG why.

The tests make free usage of the Python standard library and other major external libraries, e.g. huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=0 uses FTPlib. Kind of cool.

They even test graph plotting? huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=11 How does it evaluate?

 Read the full article

HumanEval Created 2025-03-20 Updated 2025-07-16

 View more

The tests are present in a gzip inside the Git repo: github.com/openai/human-eval/blob/master/data/HumanEval.jsonl.gz These researchers.

To get a quick overview of the problems with jq:

jq -r '"==== \(.task_id) \(.entry_point)\n\(.prompt)"' <HumanEval.jsonl

The first two problems are:

==== HumanEval/0 has_close_elements
from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """

==== HumanEval/1 separate_paren_groups
from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """

so we understand that it takes as input an empty function with a docstring and you have to fill the function body.

The paper also shows that there can be other defined functions besides the one you have to implement.

 Read the full article

Can AI code Created 2025-03-20 Updated 2025-07-16

 View more

Appears to be a very small number of newly created problems?

 Read the full article

Image segmentation Created 2025-03-20 Updated 2025-07-16

 Read the full article

AI code generation benchmark Created 2025-03-20 Updated 2025-07-16

 View more

Bibliography:

www.reddit.com/r/LocalLLaMA/comments/1e4unuz/any_up_to_date_benchmarking_sites_for_coding_llms/

 Read the full article

AI code generation framework that tries to run code Created 2025-03-20 Updated 2025-07-16

 View more

OpenAI's GPT-4-turbo can generate and run Python code if it detects that the prompt would be better answered by Python, e.g. maths

 Read the full article

Fastest gun in the West problem Created 2025-03-20 Updated 2025-07-16

 Read the full article

 Unlisted articles are being shown, click here to show only listed articles.