This is the most important thing to understand Git!
You must:
- be able to visualize the commit tree
- understand how each git command modifies the commit DAG
It's not a tree, it's actually a DAG by Ciro Santilli 35 Updated 2024-12-15 +Created 1970-01-01
Every tree is a directed acyclic graph.
But not every directed acyclic graph is a tree.
Example of a tree (and therefore also a DAG):Convention in this presentation: arrows implicitly point up, just like in a and so on.
5
|
4 7
| |
3 6
|/
2
|
1
git log
, i.e.:- 1 is parent of 2
- 2 is parent of 3 and 6
- 3 is parent of 4
Example of a DAG that is not a tree:This is not a tree because there are two ways to reach 7:
7
|\
4 6
| |
3 5
|/
2
|
1
- 2, 3, 4, 7
- 2, 5, 6, 7
But we often say "tree" intead of "DAG" in the context of Git because DAG sounds ugly.
Example of a graph that is not a DAG:This one is not acyclic because there is a cycle 2, 3, 4, 5, 2.
6
^
|
3->4
^ |
| v
2<-5
^
|
1
Because a Git commit can have more than 1 parent due to merge commits when you do:
git merge
It can even have more than 2, there's no limit. Although that is not so common (with good reason, 2 is already one too many): softwareengineering.stackexchange.com/questions/314215/can-a-git-commit-have-more-than-2-parents/377903#377903
There are two ways to organize a project:
- linear history
- branched history: history with merge commits
Some people like merges, but they are ugly and stupid. Rebase instead and keep linear history.
Linear history:
5 master
|
4
|
3
|
2
|
1 first commit
Branched history:
7 master
|\
| \
6 \
|\ \
| | |
3 4 5
| | |
| / /
|/ /
2 /
| /
1/ first commit
Here commits 6 and 7 are the so called "merge commits":
- they have multiple parents:
- 6 has parents 3 and 4
- 7 has parents 5 and 6
- they are useless and don't contain any real information
Which type of tree do you think will be easier to understand and maintain?
????
????????????
You may disconnect now if you still like branched history.
Generate a minimal test repo. You should get in the habit of doing this to test stuff out.
#!/usr/bin/env bash
mkdir git-tips
cd git-tips
git init
for i in 1 2 3 4 5; do
echo $i > f
git add f
git commit -m $i
done
git checkout HEAD~2
git checkout -b my-feature
for i in 6 7; do
echo $i > f
git add f
git commit -m $i
done
Option 1)
git commit
. Doh!!!Option 2)
git rebase
. Basically allows you to do arbitrary modifications to the tree. The most important ones are: Oh, but there are 2 trees: local and remote by Ciro Santilli 35 Updated 2024-12-15 +Created 1970-01-01
Oh but there are usually 2 trees: local and remote.
So you also have to learn how to observe and modify and sync with the remote tree!
But basically:to update the remote tree. And then you can use it exactly like any other branch, except you prefix them with the remote (usually
git fetch
origin/*
), e.g.:origin/master
is the latest fetch of the remote version ofmaster
origin/my-feature
is the latest fetch of the remote version ofmy-feature
GNU Core Utils command line utility by Ciro Santilli 35 Updated 2024-12-15 +Created 1970-01-01
Non-POSIX only here.
The best open source implementation as of 2020 seems to be: Mozilla rr.
An imagenet10 subset by fast.ai.
Size of full sized image version: 1.5 GB.
ImageNet Large Scale Visual Recognition Challenge dataset by Ciro Santilli 35 Updated 2024-12-15 +Created 1970-01-01
Subset of ImageNet. About 167.62 GB in size according to www.kaggle.com/competitions/imagenet-object-localization-challenge/data.
Contains 1,281,167 images and exactly 1k categories which is why this dataset is also known as ImageNet1k: datascience.stackexchange.com/questions/47458/what-is-the-difference-between-imagenet-and-imagenet1k-how-to-download-it
www.kaggle.com/competitions/imagenet-object-localization-challenge/overview clarifies a bit further how the categories are inter-related according to WordNet relationships:
The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other.
image-net.org/challenges/LSVRC/2012/browse-synsets.php lists all 1k labels with their WordNet IDs.There is a bug on that page however towards the middle:and there is one missing label if we ignore that dummy
n02119789: kit fox, Vulpes macrotis
n02100735: English setter
n02096294: Australian terrier
n03255030: dumbbell
href="ht:
n02102040: English springer, English springer spaniel
href=
line. A thinkg of beauty!Also the lines are not sorted by synset, if we do then the first three lines are:
n01440764: tench, Tinca tinca
n01443537: goldfish, Carassius auratus
n01484850: great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57 has lines of type:therefore numbered on the exact same order as image-net.org/challenges/LSVRC/2012/browse-synsets.php
n02119789 1 kit_fox
n02100735 2 English_setter
n02110185 3 Siberian_husky
gist.github.com/yrevar/942d3a0ac09ec9e5eb3a lists all 1k labels as a plaintext file with their benchmark IDs.therefore numbered on sorted order of image-net.org/challenges/LSVRC/2012/browse-synsets.php
{0: 'tench, Tinca tinca',
1: 'goldfish, Carassius auratus',
2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
The official line numbering in-benchmark-data can be seen at
LOC_synset_mapping.txt
, e.g. www.kaggle.com/competitions/imagenet-object-localization-challenge/data?select=LOC_synset_mapping.txtn01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
huggingface.co/datasets/imagenet-1k also has some useful metrics on the split:
- train: 1,281,167 images, 145.7 GB zipped
- validation: 50,000 images, 6.67 GB zipped
- test: 100,000 images, 13.5 GB zipped
There are unlisted articles, also show them or only show them.