The third one from Cambridge after:
Some good mentions at: Video "Where is Anatomy Encoded in Living Systems? by Michael Levin (2022)".
As of 2023, working with DNS data is just going through a mish-mash of closed datasets/expensive APIs.
We really need some open data in that area.
- opendata.stackexchange.com/questions/1951/dataset-of-domain-names
- opendata.stackexchange.com/questions/2110/domain-name-system-record-a-database
- webmasters.stackexchange.com/questions/33395/find-the-ip-address-of-expired-domains/142751#142751
- superuser.com/questions/686195/how-to-find-the-last-ip-used-for-an-expired-domain-name/1793224#1793224
École Polytechnique student culture by
Ciro Santilli 35 Updated 2025-02-26 +Created 1970-01-01
This is going to be the most important application of generative AI. Especially if we ever achieve good text-to-video.
Image generators plus human ranking:
- pornpen.ai/ a bit too restrictive. Girl laying down. Girl sitting. Penis or no penis. But realtively good at it
- civitai.tv/. How to reach it: civitai.tv/tag/nun/2/
www.pornhub.com/view_video.php?viewkey=ph63c71351edece: Heavenly Bodies Part 1: Sister's Mary First Act. Pornhub title: "AI generated Hentai Story: Sexy Nun alternative World(Isekai) Stable Diffusion" Interesting concept, slide-narrated over visual novel. The question is how they managed to keep face consistency across images.
The fundamental insight of Git design is: a SHA represents not only current state, but also the full history due to the Merkle tree implementation, see notably:
This makes it so that you will always notice if you are overwriting history on the remote, even if you are developing from two separate local computers (or more commonly, two people in two different local computers) and therefore will never lose any work accidentally.
It is very hard to achieve that without the Merkle tree.
Consider for example the most naive approach possible of marking versions with consecutive numbers:
- Local 1:
- 0: root commit
- 1: commit 1
- 2: commit 2 by local 1
- Local 2:
- 0: root commit
- 1: commit 1
- 2: commit 2 by local 2
- 3: commit 3 by local 2
- Remote
- 0: root commit
- 1: commit 1
If Local 1 were to push to Remote first, how could Local 2 notice that when it tries to push itself? The navie method of just checking: "does Remote have commit "2"" does not work, because Local 2 has a different version of commit 2 than local 1.
Subset generators:
- github.com/mf1024/ImageNet-datasets-downloader generates on download, very good. As per github.com/mf1024/ImageNet-Datasets-Downloader/issues/14 counts go over the limit due to bad multithreading. Also unfortunately it does not start with a subset of 1k.
- github.com/BenediktAlkin/ImageNetSubsetGenerator
Unfortunately, since ImageNet is a closed standard no one can upload such pre-made subsets, forcing everybody to download the full dataset, in ImageNet1k, which is huge!
The official page: www.image-net.org/challenges/LSVRC/index.php points to a download link on Kaggle: www.kaggle.com/competitions/imagenet-object-localization-challenge/data Kaggle says that the size is 167.62 GB!
To download from Kaggle, create an API token on kaggle.com, which downloads a The download speed is wildly server/limited and take A LOT of hours. Also, the tool does not seem able to pick up where you stopped last time.
kaggle.json
file then:mkdir -p ~/.kaggle
mv ~/down/kaggle.json ~/.kaggle
python3 -m pip install kaggle
kaggle competitions download -c imagenet-object-localization-challenge
Another download location appears to be: huggingface.co/datasets/imagenet-1k on Hugging Face, but you have to login due to their license terms. Once you login you have a very basic data explorer available: huggingface.co/datasets/imagenet-1k/viewer/default/train.
There are unlisted articles, also show them or only show them.