-
Notifications
You must be signed in to change notification settings - Fork 1
/
xaa.txt
104 lines (104 loc) · 5.38 KB
/
xaa.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Git Internals
Source code control and beyond
by Scott Chacon
Understanding Git
In this section,
we will go over what Git was built for and how it works,
hopefully laying the groundwork to properly understand what it is doing when we run the commands.
The first commit message for the Git project was
‘initial version of “git”, the information manager from hell’ – Linus, 4/7/05
When I learned Git,
as many people do,
I learned it in the context of other SCMs I had used
– Subversion or CVS.
I have come to believe that this is a horrible way to learn Git.
I felt far more comfortable with it when I stopped thinking that ‘git add’ was sort of like ‘svn add’,
but instead understood what it was actually doing.
Then I found I could find new and interesting ways to use what is really a very powerful and cool toolset.
So, let’s see what it’s doing behind the scenes first.
What is Git?
Git is a stupid content tracker.
That is probably the best description of it
– don’t think of it in a ‘like (insert favorite SCM system), but...’ context,
but more like a really interesting file system.
Git tracks content
– files and directories.
It is at its heart a collection of simple tools that implement a tree history storage
and directory content management system.
It is simply used as an SCM, not really designed as one.
“In many ways you can just see git as a filesystem
— it’s content-addressable,
and it has a notion of versioning,
but I really really designed it coming at the problem from the viewpoint of a
filesystem person
(hey, kernels is what I do),
and I actually have absolutely zero interest in creating a traditional SCM system.”
– Linus (http://marc.info/?l=linux-kernel&m=111314792424707)
When most SCMs store a new version of a project,
they store the code delta or diff.
When Git stores a new version of a project,
it stores a new tree – a bunch of blobs of content
and a collection of pointers that can be expanded back out into a full directory of files and subdirectories.
If you want a diff between two versions,
it doesn’t add up all the deltas,
it simply looks at the two trees and runs a new diff on them.
This is what fundamentally allows the system to be easily distributed
– it doesn’t have issues figuring out how to apply a complex series of deltas,
it simply transfers all the directories and content that one user has and another does not have but is requesting.
It is efficient about it
– it only stores identical files and directories once and it can compress and transfer its content using delta-compressed packfiles
– but in concept, it is a very simple beast.
Git is at it’s heart very stupid-simple.
Focus and Design
There are a number of areas that the developers of Git,
including and especially Linus,
have focused on in conceiving and building Git.
There may be a lot of things that Git is not good at,
but these things are what Git is very good at.
Non-Linear Development
Git is optimized for cheap and efficient branching and merging.
It is built to be worked on simultaneously by many people,
having multiple branches developed by individual developers,
being merged,
branched and re-merged constantly.
Because of this, branching is incredibly cheap and merging is incredibly easy.
Distributed Development
Git is built to make distributed development simple.
No repository is special or central in Git
– each clone is basically equal and could generally replace any other one at any time.
It works completely offline
or with hundreds of remote repositories that can push to and/or fetch from each other over several simple and standard protocols.
Efficiency
Git is very efficient.
Compared to many popular SCM systems,
it seems downright unbelievably fast.
Most operations are local,
which reduces unnecessary network overhead.
Repositories are generally packed very efficiently,
which often leads to surprisingly small repo sizes.
The Ruby on Rails Git repository download,
which includes the full history of the project – every version of every file,
weighs in at around 13M,
which is not even twice the size of a single checkout of the project (~9M).
The Subversion server repository for the same project is about 115M.
Git also is efficient in its network operations
– the common Git transfer protocols transfer only packed versions of only the objects that have changed.
It also won’t try to transfer content twice,
so if you have the same file under two different names,
it will only transfer the content once.
A Toolkit Design
Git is not really a single binary,
but a collection of dozens of small specialized programs,
which is sometimes annoying to people trying to learn Git,
but is pretty cool when you want to do anything non-standard with it.
Git is less a program and more a toolkit that can be combined and chained to do new and interesting things.
For a long time,
Git was just the raw toolkit and the project to wrap those into a user friendly SCM was called Cogito.
That project has since been abandoned as Git itself became easier to use.
The tools can be more or less divided into two major camps,
often referred to as the porcelain and the plumbing.
The plumbing is not really meant to be used by people on the command line,
but rather to do simple things flexibly and are combined by programs and scripts into porcelain programs.
The porcelain programs are largely what we will be focusing on in this book
the user-oriented interfaces to do SCM type things
—hiding the low-level fun.