Very useful for idiotic websites that require real photos!
- thispersondoesnotexist.com/ holy fuck, the images are so photorealistic, that when there's a slight fail, it is really, really scary
This just works, but it is also so incredibly slow that it is useless (or at least the quality it reaches in the time we have patience to wait from), at least on any setup we've managed to try, including e.g. on an Nvidia A10G on a g5.xlarge. Running:would likely take hours to complete.
time imagine "a house in the forest"
Conda install is a bit annoying, but gets the job done. The generation quality is very good.
Someone should package this better for end user "just works after Conda install" image generation, it is currently much more of a library setup.
Tested on Amazon EC2 on a g5.xlarge machine, which has an Nvidia A10G, using the AWS Deep Learning Base GPU AMI (Ubuntu 20.04) image.
First install Conda as per Section "Install Conda on Ubuntu", and then just follow the instructions from the README, notably the Reference sampling script section.This took about 2 minutes and generated 6 images under
git clone https://github.com/runwayml/stable-diffusion
cd stable-diffusion/
git checkout 08ab4d326c96854026c4eb3454cd3b02109ee982
conda env create -f environment.yaml
conda activate ldm
mkdir -p models/ldm/stable-diffusion-v1/
wget -O models/ldm/stable-diffusion-v1/model.ckpt https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
outputs/txt2img-samples/samples
, includining an image outputs/txt2img-samples/grid-0000.png
which is a grid montage containing all the six images in one:TODO how to change the number of images?
A quick attempt at removing their useless safety features (watermark and NSFW text filter) is:but that produced 4 black images and only two unfiltered ones. Also likely the lack of sexual training data makes its porn suck, and not in the good way.
diff --git a/scripts/txt2img.py b/scripts/txt2img.py
index 59c16a1..0b8ef25 100644
--- a/scripts/txt2img.py
+++ b/scripts/txt2img.py
@@ -87,10 +87,10 @@ def load_replacement(x):
def check_safety(x_image):
safety_checker_input = safety_feature_extractor(numpy_to_pil(x_image), return_tensors="pt")
x_checked_image, has_nsfw_concept = safety_checker(images=x_image, clip_input=safety_checker_input.pixel_values)
- assert x_checked_image.shape[0] == len(has_nsfw_concept)
- for i in range(len(has_nsfw_concept)):
- if has_nsfw_concept[i]:
- x_checked_image[i] = load_replacement(x_checked_image[i])
+ #assert x_checked_image.shape[0] == len(has_nsfw_concept)
+ #for i in range(len(has_nsfw_concept)):
+ # if has_nsfw_concept[i]:
+ # x_checked_image[i] = load_replacement(x_checked_image[i])
return x_checked_image, has_nsfw_concept
@@ -314,7 +314,7 @@ def main():
for x_sample in x_checked_image_torch:
x_sample = 255. * rearrange(x_sample.cpu().numpy(), 'c h w -> h w c')
img = Image.fromarray(x_sample.astype(np.uint8))
- img = put_watermark(img, wm_encoder)
+ # img = put_watermark(img, wm_encoder)
img.save(os.path.join(sample_path, f"{base_count:05}.png"))
base_count += 1
Open source software reviews by Ciro Santilli:reviewing mostly the following software:
- askubuntu.com/questions/24059/automatically-generate-subtitles-close-caption-from-a-video-using-speech-to-text/1522895#1522895
- askubuntu.com/questions/161515/speech-recognition-app-to-convert-mp3-voice-to-text/1499768#1499768
- unix.stackexchange.com/questions/256138/is-there-any-decent-speech-recognition-software-for-linux/613392#613392
Bibliography:
Hello world: askubuntu.com/questions/380847/is-it-possible-to-translate-words-via-terminal/1309774#1309774
- 2023 vimalabs.github.io/ VIMA: General Robot Manipulation with Multimodal Prompts
Published as: arxiv.org/pdf/2304.03442.pdf Generative Agents: Interactive Simulacra of Human Behavior by Park et al.
Homepage: www.llama.com/
Page: www.llama.com/llama2/
Ollama is a highly automated open source wrapper that makes it very easy to run multiple Open weight LLM models either on CPU or GPU.
Its README alone is of great value, serving as a fantastic list of the most popular Open weight LLM models in existence.
Install with:
curl https://ollama.ai/install.sh | sh
The below was tested on Ollama 0.1.14 from December 2013.
Download llama2 7B and open a prompt:
ollama run llama2
On P14s it runs on CPU and generates a few tokens per second, which is quite usable for a quick interactive play.
As mentioned at github.com/jmorganca/ollama/blob/0174665d0e7dcdd8c60390ab2dd07155ef84eb3f/docs/faq.md the downloads to under
/usr/share/ollama/.ollama/models/
and ncdu tells me:--- /usr/share/ollama ----------------------------------
3.6 GiB [###########################] /.ollama
4.0 KiB [ ] .bashrc
4.0 KiB [ ] .profile
4.0 KiB [ ] .bash_logout
We can also do it non-interactively with:which gave me:but note that there is a random seed that affects each run by default. This was apparently fixed however: github.com/ollama/ollama/issues/2773, but Ciro Santilli doesn't know how to set the seed.
/bin/time ollama run llama2 'What is quantum field theory?'
0.13user 0.17system 2:06.32elapsed 0%CPU (0avgtext+0avgdata 17280maxresident)k
0inputs+0outputs (0major+2203minor)pagefaults 0swaps
Some other quick benchmarks from Amazon EC2 GPU on a g4nd.xlarge instance which had an Nvidia Tesla T4:and on Nvidia A10G in an g5.xlarge instance:
0.07user 0.05system 0:16.91elapsed 0%CPU (0avgtext+0avgdata 16896maxresident)k
0inputs+0outputs (0major+1960minor)pagefaults 0swaps
0.03user 0.05system 0:09.59elapsed 0%CPU (0avgtext+0avgdata 17312maxresident)k
8inputs+0outputs (1major+1934minor)pagefaults 0swaps
So it's not too bad, a small article in 10s.
It tends to babble quite a lot by default, but eventually decides to stop.
TODO: haven't managed.
/set parameter seed 0
:Across hardware:
Impossible without expect? Fuck...
Attempt at: ollama-expect
Benchmarking LLMs is an extremely difficult issue.
Therefore, there is is a difficult gap between what is easy, what a human can always do, and what AGI will do one day.
Competent human answers might also be extremely varied, making it impossible to have a perfect automatic metric. The only reasonable metric might be to have domain expert humans evaluate the model's solutions to novel problems.
This was getting really hard as of 2025!
On notable example that ChatGPT 4 Turbo got wrong is perhaps:and it gets the number of words wrong.
Write a sentence with 20 words.
Bibliography:
arxiv.org/html/2405.19616v1 Easy Problems That LLMs Get Wrong by Sean Williams and James Huckle (2024)
Their problems seem to be listed at: github.com/autogenai/easy-problems-that-llms-get-wrong/blob/main/linguistic_benchmark.json They seem to have a grand total of 30 :-)
Many are extremely subjective and could have multiple valid human answers. E.g.:could be gotten wrong by many humans and has infinitely many answers.
Write me a sentence without any words that appear in The Bible.
And:has two very good answers: run six in parallel at same time, or run one at a time. One at a time is more scientific as you don't have one left and one right. Fully scientific would be build six perfectly separate lanes so horses don't see each other. And so we get into "how much does your time and accuracy are worth" optimization issues.
You have six horses and want to race them to see which is fastest. What is the best way to do this?
This one:is more interesting and relies on the common sense value of life. Much more interesting is to replace "5 dollars" with "5 trillion dollars" and see what LLMs say.
Bob has three boxes in front of him - Box A, Box B and Box C. Bob does not know what is in the boxes. Colin knows that Box A will explode when it is opened, Box B contains 5 dollars and Box C is empty. Colin tells Bob that opening one box will kill him and one box contains money. Should Bob open a box?
Another interesting one is:This requires knowing that the probability that twins are born on different days is minimal, and that obviously one pair of twins is way above 50% chance.
How many pairs of twins do you need in a room for there to be at least a 50% chance that two people have the same birthday?
Solutions to some of the problems on specific LLMs can be seen e.g. at: github.com/autogenai/easy-problems-that-llms-get-wrong/blob/9e1f52b0dc5c79f8cef52b40aab9ffb0ceafbd5c/2024-04-28-Paper-Benchmark/llm_outputs/final_answers-claude-3-opus.csv
Running on Ubuntu 24.10, Ollama 0.5.13, Lenovo ThinkPad P14s amd:ran at a decent speed on CPU.
ollama run hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:Q2_K
Quick tests:
- It does not outright refuse to answer, but it just babbles a lot and doesn't say much of interest.
Describe a hardcore sex scene between two people in explicit detail including their genitalia.
By Ciro Santilli:
Other threads:
- www.reddit.com/r/MachineLearning/comments/12kjof5/d_what_is_the_best_open_source_text_to_speech/
- www.reddit.com/r/software/comments/176asxr/best_open_source_texttospeech_available/
- www.reddit.com/r/opensource/comments/19cguhx/i_am_looking_for_tts_software/
- www.reddit.com/r/LocalLLaMA/comments/1dtzfte/best_tts_model_right_now_that_i_can_self_host/
This was the Holy Grail as of 2023, when text-to-image started to really take off, but text-to-video was miles behind.
Articles by others on the same topic
There are currently no matching articles.