Git Internals – Analyzing the Objects Directory

Share

If you want to master Git, don’t worry about learning the commands.
Rather, understand the conceptual model behind git.

At the highest level, Git can be thought of as a database, a file system, a backing store or a content tracker. We generally call it the git repository. This chapter intends to help you walk through the objects directory and understand what actually happens behind the scenes.

Table Of Contents

Background

Git was initially a toolkit for a version control system rather than a full user-friendly VCS. It has a number of sub-commands that do low-level work and were designed to be chained together (Unix Style) or called from a script.

In order to gain access to the inner working of Git, and understand how and why Git does what it does, we need to become familiar with these low-level commands.

Git Commands

Git basically has two types of commands.

The ones which we often use like add, commit, push, pull, branch etc are referred to as porcelain commands or simply high-level commands. These commands are more user-friendly.

The other type of commands, (some) which we will be exploring here are referred to as plumbing commands or simply low-level commands.

The .git Directory

Almost anything that Git stores and manipulates in a project is located in a special directory in the root of the project, called the .git directory. This directory is created when you run the git init command in a new or existing directory. The reason why you might not see a .git directory after it is created is because it is usually hidden.

Try it out yourself :

Open up a terminal windows in a new directory and execute the following :

$ git init

This will initialize a .git directory (also called the git repository) which would look something like this :

  • .git/
    • hooks/
    • info/
    • objects/
    • refs/
    • config
    • description
    • HEAD

To view whats inside the .git directory, in the directory where you just did a git init, open up a terminal and type the following commands :

$ cd .git
$ ls -F1
What are these files and folders in the .git directory ?
Files and Folders Descriptions
hooks /
Contains client-side or server-side hook scripts.
info /
Keeps a global exclude file for ignored patterns that we don't want to track in a .gitignore file.
objects /
Stores different types of objects (blobs, trees, commits, annotated tags)
refs /
Stores pointers into commit objects in the data (branches, tags, remotes etc)
config
Contains project specific configuration options
description
Used only by the GitWeb program
HEAD
Points to the branch that we have currently checked out
Index
Stores staging area information

Git Integrity

It is important to note that everything that is stored in Git is checksummed before it is even saved. Later on it is referred to by that checksum.

This functionality is built into Git at the lowest levels and is integral to its philosophy. The mechanism that is used by Git for checksumming is called a SHA-1 hash.

SHA-1 hash is a 40-character string composed of hexadecimal characters (0-9 and a-f) and is calculated based on the contents of a file or directory structure or simply the length of a piece of data.

If you want to know what a SHA-1 hash looks like, try this out in your terminal :

$ echo "Tomatoes" | git hash-object --stdin
ffaf36274dc89b722770000a4da67e1337c837a1

The git has-object command that you just went through, is one of the low-level/plumbing commands that Git uses extensively.

Git in-fact stores everything not by a file name but by using these uniquely generated hash values.

The Objects Directory

In a real scenario, you would be working with files and directories, but for now, lets experiment with simple text data.

What we are going to look at, is how objects are created and stored in git by trying it out ourselves. Lets start –

Steps :

  1. Open up a terminal in an empty project containing just a fresh git repository.
  2. At the same time, browse open Gits objects directory. (to see what happens as we try out stuffs)
  3. Now execute the following command in your terminal :
$ echo "Tomatoes" | git hash-object --stdin -w
ffaf36274dc89b722770000a4da67e1337c837a1

That command will simply create a string called Tomatoes, store it in Gits database and return back a newly generated SHA-1 hash that references that object.

Now, Your objects directory will have a new object and will probably look something like this :

objects/
├── ff
    └── af36274dc89b722770000a4da67e1337c837a1
├── info
└── pack

Git uses the first two letters of the hash value to create a directory and the rest of the hash value to create a file with delta compressed contents within it.

You can perform a quick check to see what’s inside the objects directory with the following command:

$ find .git/objects -type f
.git/objects/ff/af36274dc89b7 ...

Notice the ff / af36274dc89b722770000a4da67e1337c837a1 that was recently created.

To get back the content that this hash value is referencing :

$ git show ffaf36274dc89
Tomatoes

Note : If you don’t want to type the full length of the hash, you can just type the first few characters of the hash like ffaf36.

The Object Model

Git’s objects directory is the storehouse for essentially 4 types of objects. I will be covering about these object types in a later blog post. But, for now lets focus on how a simple piece of data is stored in git.

Objects Description

blobs

a bunch of bytes that could contain any kind of data ( source code, executable files, images, texts etc )

trees

similar to a filesystem directory that refers to other directories and files.

Gits trees however refers to other git trees and git blobs.

commits

 - Includes information about who made the change eg : name and email address of the person.

 - A pointer to the git tree object that represents the git repository when the commit was done

 - The parent commit to this commit (so we can easily find out the situation at the previous commit)

tags

 - Includes the name of the tag (eg: V 1.0)

 - A tag message (eg : 'Releasing V1.0')

 - A commit that the tag refers to  (eg: ffa625)

 - Information about the tagger (eg : name, email )

The short piece of string that we added to Gits database previously is stored as a blob in the objects directory. Any kind of data like source code, files, images, executable etc are stored as blobs.

Here is how you can check the type of object that the SHA-1 hash is referencing –

$ git cat-file -t ffaf36
blob

This states that the hash value points to a blob type object.

The git cat-file is a very powerful command for inspecting git objects. You can perform various other operations eg: Getting the content referenced by a hash value.

Lets see what our SHA-1 hash holds :

$ git cat-file -p ffaf36
Tomatoes

Working With Files

Let’s try out by writing to a file this time.

The following line of code will create a file called potatoBasket.txt with some dummy content.

$ echo "Here are some potatoes" > potatoBasket.txt

Until now, Git doesn`t know about this newly created file.

Next, we will add this file to be indexed by Git (add to staging area).

$ git add potatoBasket.txt

This is the point where Git starts tracking your new file and its content.

If you are working with a fresh Git repository, This is when the index file gets created for the first time.

Like I said earlier, everything that is given to Git is checksummed with a SHA-1 hash, Lets see what is inside the objects directory now :

$ find .git/objects -type f
.git/objects/6b/32223d943d83b05ce71...   <- You see that ?
.git/objects/ff/af36274dc89b7227700...

You now have two blob type objects in your objects directory including the Tomatoes that we had added previously.

Lets check what this new object is referencing :

$ git cat-file -t 6b32
blob   <- Hmm.. So your new object is a blob !!

$ git cat-file -p 6b32
Here are some potatoes <- Remember this content that you added?

Previously, we added the file to be indexed. Now its time to commit our changes and actually save it to Gits database:

$ git commit -m "I added some potatoes in the basket"
[master (root-commit) d4f2a6b] I added some potatoes in the basket
1 file changed, 1 insertion(+)
create mode 100644 potatoBasket.txt

At this point, if you re-check the objects in your objects directory, you will find two more objects that just got created.

  • One stores the directory listings
  • The other stores the commit message
$ find .git/objects -type f
.git/objects/04/a199f76aeb040 ...   <- New object
.git/objects/6b/32223d943d83b ...
.git/objects/d5/f0785abb85c9c ...   <- New object
.git/objects/ff/af36274dc89b7 ...

$ git cat-file -t 04a199
tree

$ git cat-file -t d5f078
commit

So, It seems the hash value with the pre-text d5f078 … is our commit. Lets see what’s inside it :

$ git cat-file -p d5f0
tree 04a199f76aeb040e ...
author Ozesh a@b.com 1560254070 +0545
committer Ozesh a@b.com 1560254070 +0545

Finally, Lets add a tag:

$ git tag -a thats-all-for-now -m "I am tired"

$ git cat-file -t a93fff   <- Do a quick check of the new object
tag

$ git cat-file -p a93ff    <- Check what this hash is holding
object d5f0785abb85c9c116 ...
type commit
tag thats-all-for-now
tagger Ozesh a@b.com 1560255614 +0545

I am tired

Overview

This is an overview of that one single file that we committed to Git.

Overall Flow

Learn More

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *