Git object by Ciro Santilli 35 Updated +Created
Understand the commit tree by Ciro Santilli 35 Updated +Created
This is the most important thing to understand Git!
You must:
  • be able to visualize the commit tree
  • understand how each git command modifies the commit DAG
It's not a tree, it's actually a DAG by Ciro Santilli 35 Updated +Created
But not every directed acyclic graph is a tree.
Example of a tree (and therefore also a DAG):
5
|
4 7
| |
3 6
|/
2
|
1
Convention in this presentation: arrows implicitly point up, just like in a git log, i.e.:
  • 1 is parent of 2
  • 2 is parent of 3 and 6
  • 3 is parent of 4
and so on.
Example of a DAG that is not a tree:
7
|\
4 6
| |
3 5
|/
2
|
1
This is not a tree because there are two ways to reach 7:
  • 2, 3, 4, 7
  • 2, 5, 6, 7
But we often say "tree" intead of "DAG" in the context of Git because DAG sounds ugly.
Example of a graph that is not a DAG:
6
^
|
3->4
^  |
|  v
2<-5
^
|
1
This one is not acyclic because there is a cycle 2, 3, 4, 5, 2.
Why is Git a DAG? by Ciro Santilli 35 Updated +Created
Because a Git commit can have more than 1 parent due to merge commits when you do:
git merge
It can even have more than 2, there's no limit. Although that is not so common (with good reason, 2 is already one too many): softwareengineering.stackexchange.com/questions/314215/can-a-git-commit-have-more-than-2-parents/377903#377903
Linear history vs branching by Ciro Santilli 35 Updated +Created
There are two ways to organize a project:
  • linear history
  • branched history: history with merge commits
Some people like merges, but they are ugly and stupid. Rebase instead and keep linear history.
Linear history:
5 master
|
4
|
3
|
2
|
1 first commit
Branched history:
7   master
|\
| \
6  \
|\  \
| |  |
3 4  5
| |  |
| /  /
|/  /
2  /
| /
1/  first commit
Here commits 6 and 7 are the so called "merge commits":
  • they have multiple parents:
    • 6 has parents 3 and 4
    • 7 has parents 5 and 6
  • they are useless and don't contain any real information
Which type of tree do you think will be easier to understand and maintain?
????
????????????
You may disconnect now if you still like branched history.
How to visualize the commit tree by Ciro Santilli 35 Updated +Created
Generate a minimal test repo. You should get in the habit of doing this to test stuff out.
#!/usr/bin/env bash

mkdir git-tips
cd git-tips
git init

for i in 1 2 3 4 5; do
  echo $i > f
  git add f
  git commit -m $i
done

git checkout HEAD~2
git checkout -b my-feature

for i in 6 7; do
  echo $i > f
  git add f
  git commit -m $i
done
How to modify the commit tree by Ciro Santilli 35 Updated +Created
Option 1) git commit. Doh!!!
Option 2) git rebase. Basically allows you to do arbitrary modifications to the tree. The most important ones are:
Oh, but there are 2 trees: local and remote by Ciro Santilli 35 Updated +Created
Oh but there are usually 2 trees: local and remote.
So you also have to learn how to observe and modify and sync with the remote tree!
But basically:
git fetch
to update the remote tree. And then you can use it exactly like any other branch, except you prefix them with the remote (usually origin/*), e.g.:
  • origin/master is the latest fetch of the remote version of master
  • origin/my-feature is the latest fetch of the remote version of my-feature
Merge conflicts by Ciro Santilli 35 Updated +Created
gitk by Ciro Santilli 35 Updated +Created
Figure 1.
gitk 2.34.1 running on Ubuntu 22.04 with a simple repository.
GNOME desktop by Ciro Santilli 35 Updated +Created
GNU Core Utils command line utility by Ciro Santilli 35 Updated +Created
Non-POSIX only here.
GDB reverse debugging by Ciro Santilli 35 Updated +Created
The best open source implementation as of 2020 seems to be: Mozilla rr.
Classiq by Ciro Santilli 35 Updated +Created
High level human brain structure by Ciro Santilli 35 Updated +Created
Human Connectome Project by Ciro Santilli 35 Updated +Created
ImageNet 2015 by Ciro Santilli 35 Updated +Created
Imagenette by Ciro Santilli 35 Updated +Created
An imagenet10 subset by fast.ai.
Size of full sized image version: 1.5 GB.
ImageNet Large Scale Visual Recognition Challenge dataset by Ciro Santilli 35 Updated +Created
Contains 1,281,167 images and exactly 1k categories which is why this dataset is also known as ImageNet1k: datascience.stackexchange.com/questions/47458/what-is-the-difference-between-imagenet-and-imagenet1k-how-to-download-it
www.kaggle.com/competitions/imagenet-object-localization-challenge/overview clarifies a bit further how the categories are inter-related according to WordNet relationships:
The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other.
image-net.org/challenges/LSVRC/2012/browse-synsets.php lists all 1k labels with their WordNet IDs.
n02119789: kit fox, Vulpes macrotis
n02100735: English setter
n02096294: Australian terrier
There is a bug on that page however towards the middle:
n03255030: dumbbell
href="ht:
n02102040: English springer, English springer spaniel
and there is one missing label if we ignore that dummy href= line. A thinkg of beauty!
Also the lines are not sorted by synset, if we do then the first three lines are:
n01440764: tench, Tinca tinca
n01443537: goldfish, Carassius auratus
n01484850: great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57 has lines of type:
n02119789 1 kit_fox
n02100735 2 English_setter
n02110185 3 Siberian_husky
therefore numbered on the exact same order as image-net.org/challenges/LSVRC/2012/browse-synsets.php
gist.github.com/yrevar/942d3a0ac09ec9e5eb3a lists all 1k labels as a plaintext file with their benchmark IDs.
{0: 'tench, Tinca tinca',
 1: 'goldfish, Carassius auratus',
 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
therefore numbered on sorted order of image-net.org/challenges/LSVRC/2012/browse-synsets.php
The official line numbering in-benchmark-data can be seen at LOC_synset_mapping.txt, e.g. www.kaggle.com/competitions/imagenet-object-localization-challenge/data?select=LOC_synset_mapping.txt
n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
huggingface.co/datasets/imagenet-1k also has some useful metrics on the split:
  • train: 1,281,167 images, 145.7 GB zipped
  • validation: 50,000 images, 6.67 GB zipped
  • test: 100,000 images, 13.5 GB zipped

There are unlisted articles, also show them or only show them.