If you want to master Git, don’t worry about learning the commands.
Rather, understand the conceptual model behind git.
At the highest level, Git can be thought of as a database, a file system, a backing store or a content tracker. We generally call it the git repository. This chapter intends to help you walk through the objects directory and understand what actually happens behind the scenes.
Table Of Contents
Git was initially a toolkit for a version control system rather than a full user-friendly VCS. It has a number of sub-commands that do low-level work and were designed to be chained together (Unix Style) or called from a script.
In order to gain access to the inner working of Git, and understand how and why Git does what it does, we need to become familiar with these low-level commands.
Git basically has two types of commands.
The ones which we often use like add, commit, push, pull, branch etc are referred to as porcelain commands or simply high-level commands. These commands are more user-friendly.
The other type of commands, (some) which we will be exploring here are referred to as plumbing commands or simply low-level commands.
The .git Directory
Almost anything that Git stores and manipulates in a project is located in a special directory in the root of the project, called the .git directory. This directory is created when you run the git init command in a new or existing directory. The reason why you might not see a .git directory after it is created is because it is usually hidden.
Try it out yourself :
Open up a terminal windows in a new directory and execute the following :
$ git init
This will initialize a .git directory (also called the git repository) which would look something like this :
To view whats inside the .git directory, in the directory where you just did a git init, open up a terminal and type the following commands :
$ cd .git $ ls -F1
What are these files and folders in the .git directory ?
|Files and Folders||Descriptions|
Contains client-side or server-side hook scripts.
Keeps a global exclude file for ignored patterns that we don't want to track in a .gitignore file.
Stores different types of objects (blobs, trees, commits, annotated tags)
Stores pointers into commit objects in the data (branches, tags, remotes etc)
Contains project specific configuration options
Used only by the GitWeb program
Points to the branch that we have currently checked out
Stores staging area information
It is important to note that everything that is stored in Git is checksummed before it is even saved. Later on it is referred to by that checksum.
This functionality is built into Git at the lowest levels and is integral to its philosophy. The mechanism that is used by Git for checksumming is called a SHA-1 hash.
SHA-1 hash is a 40-character string composed of hexadecimal characters (0-9 and a-f) and is calculated based on the contents of a file or directory structure or simply the length of a piece of data.
If you want to know what a SHA-1 hash looks like, try this out in your terminal :
$ echo "Tomatoes" | git hash-object --stdin ffaf36274dc89b722770000a4da67e1337c837a1
The git has-object command that you just went through, is one of the low-level/plumbing commands that Git uses extensively.
Git in-fact stores everything not by a file name but by using these uniquely generated hash values.
The Objects Directory
In a real scenario, you would be working with files and directories, but for now, lets experiment with simple text data.
What we are going to look at, is how objects are created and stored in git by trying it out ourselves. Lets start –
- Open up a terminal in an empty project containing just a fresh git repository.
- At the same time, browse open Gits objects directory. (to see what happens as we try out stuffs)
- Now execute the following command in your terminal :
$ echo "Tomatoes" | git hash-object --stdin -w ffaf36274dc89b722770000a4da67e1337c837a1
That command will simply create a string called Tomatoes, store it in Gits database and return back a newly generated SHA-1 hash that references that object.
Now, Your objects directory will have a new object and will probably look something like this :
objects/ ├── ff └── af36274dc89b722770000a4da67e1337c837a1 ├── info └── pack
Git uses the first two letters of the hash value to create a directory and the rest of the hash value to create a file with delta compressed contents within it.
You can perform a quick check to see what’s inside the objects directory with the following command:
$ find .git/objects -type f .git/objects/ff/af36274dc89b7 ...
Notice the ff / af36274dc89b722770000a4da67e1337c837a1 that was recently created.
To get back the content that this hash value is referencing :
$ git show ffaf36274dc89 Tomatoes
Note : If you don’t want to type the full length of the hash, you can just type the first few characters of the hash like ffaf36.
The Object Model
Git’s objects directory is the storehouse for essentially 4 types of objects. I will be covering about these object types in a later blog post. But, for now lets focus on how a simple piece of data is stored in git.
a bunch of bytes that could contain any kind of data ( source code, executable files, images, texts etc )
similar to a filesystem directory that refers to other directories and files.
Gits trees however refers to other git trees and git blobs.
- Includes information about who made the change eg : name and email address of the person.
- A pointer to the git tree object that represents the git repository when the commit was done
- The parent commit to this commit (so we can easily find out the situation at the previous commit)
- Includes the name of the tag (eg: V 1.0)
- A tag message (eg : 'Releasing V1.0')
- A commit that the tag refers to (eg: ffa625)
- Information about the tagger (eg : name, email )
The short piece of string that we added to Gits database previously is stored as a blob in the objects directory. Any kind of data like source code, files, images, executable etc are stored as blobs.
Here is how you can check the type of object that the SHA-1 hash is referencing –
$ git cat-file -t ffaf36 blob
This states that the hash value points to a blob type object.
The git cat-file is a very powerful command for inspecting git objects. You can perform various other operations eg: Getting the content referenced by a hash value.
Lets see what our SHA-1 hash holds :
$ git cat-file -p ffaf36 Tomatoes
Working With Files
Let’s try out by writing to a file this time.
The following line of code will create a file called potatoBasket.txt with some dummy content.
$ echo "Here are some potatoes" > potatoBasket.txt
Until now, Git doesn`t know about this newly created file.
Next, we will add this file to be indexed by Git (add to staging area).
$ git add potatoBasket.txt
This is the point where Git starts tracking your new file and its content.
If you are working with a fresh Git repository, This is when the index file gets created for the first time.
Like I said earlier, everything that is given to Git is checksummed with a SHA-1 hash, Lets see what is inside the objects directory now :
$ find .git/objects -type f .git/objects/6b/32223d943d83b05ce71... <- You see that ? .git/objects/ff/af36274dc89b7227700...
You now have two blob type objects in your objects directory including the Tomatoes that we had added previously.
Lets check what this new object is referencing :
$ git cat-file -t 6b32 blob <- Hmm.. So your new object is a blob !!
$ git cat-file -p 6b32
Here are some potatoes <- Remember this content that you added?
Previously, we added the file to be indexed. Now its time to commit our changes and actually save it to Gits database:
$ git commit -m "I added some potatoes in the basket"
[master (root-commit) d4f2a6b] I added some potatoes in the basket
1 file changed, 1 insertion(+)
create mode 100644 potatoBasket.txt
At this point, if you re-check the objects in your objects directory, you will find two more objects that just got created.
- One stores the directory listings
- The other stores the commit message
$ find .git/objects -type f .git/objects/04/a199f76aeb040 ... <- New object .git/objects/6b/32223d943d83b ... .git/objects/d5/f0785abb85c9c ... <- New object .git/objects/ff/af36274dc89b7 ... $ git cat-file -t 04a199 tree $ git cat-file -t d5f078 commit
So, It seems the hash value with the pre-text d5f078 … is our commit. Lets see what’s inside it :
$ git cat-file -p d5f0 tree 04a199f76aeb040e ... author Ozesh email@example.com 1560254070 +0545 committer Ozesh firstname.lastname@example.org 1560254070 +0545
Finally, Lets add a tag:
$ git tag -a thats-all-for-now -m "I am tired" $ git cat-file -t a93fff <- Do a quick check of the new object tag $ git cat-file -p a93ff <- Check what this hash is holding object d5f0785abb85c9c116 ... type commit tag thats-all-for-now tagger Ozesh email@example.com 1560255614 +0545 I am tired
This is an overview of that one single file that we committed to Git.