This article intends to turn back the pages of history and help you observe how Git came into existence. It also intends to give you an overview of the implementation choices made while designing Git.
Below is the analytics report that I took from openhub in June – 4th, 2019.
Git : 896, 099 repositories, 70% , Subversion (nearest competitor) : 324, 611 repositories, 25%
Some of the goals that were set up during the initial development of Git were:
- Simple design
- Strong support for non-linear development (thousands of parallel branches)
- Fully distributed
- Able to handle large projects like the Linux Kernel efficiently (speed and data size)
Since its birth in 2005, Git has evolved and matured to be easy to use and yet retained all of these original qualities.
However, what makes Git unique is the way it stores and thinks about data in comparison to all the prior version control systems.
Git doesn’t track files; It tracks content.
We shall take a look at that in a while. However, first, let’s turn back the pages and see how it came to be.
Git was primarily used as the Source Control Management System for the Linux Kernel.
The story began in 1991
(After the creation of the Linux Kernel, an open source software project for the Linux Operating system). Back in those days (from 1991 to 2002), changes that were made to the kernel software were passed around as patches and tarballs – archived files.
Later on, In 2002
The Linux kernel project started using a distributed version control system called Bitkeeper. Even though Bitkeeper was proprietary software, BitMover and Larry McVoy allowed it to be used freely for open source projects. The only restriction was that you were not supposed to reverse engineer it, and you were not supposed to try to create a competing product.
However, in April 2005, It happened.
An Australian computer programmer named Andrew Tridgell tried to produce a free software (now known as SourcePuller) that interoperated with the Bitkeeper source code repository. Hence, BitMover took actions which eventually revoked the free use of Bitkeeper product license. Thus, the relationship between the community that developed the Linux Kernel and the commercial company that developed Bitkeeper broke down.
Git came into existence due to an urgent need of a working version-control system for the Linux Kernel that had a workflow like the previous one – BitKeeper. Much of its design was inspired by BitKeeper, and another distributed VCS called Monoton.
Linus Torvald, The creator of Git, described the tool as ‘the stupid content tracker‘. Its design is simply a combination of Torvalds experiences with Linux in maintaining a large distributed development project, along with his intimate knowledge of file system performance from the same project.
The design criteria included the following things:
- Applying patches should take no more than 3 seconds.
- Take Concurrent Version System (CVS) as an example of what not to do;
If in doubt; make the exact opposite decision.
- Support a distributed workflow – like BitKeeper
- Include robust safeguards against corruption, either accidental or malicious
Below are some characteristics that Git still retains until this date.
These are the implementation choices that were influenced by Torvalds experiences while developing the Linux Kernel and the prior version control system that the kernel used during that time.
Git supports rapid branching and merging and includes specific tools for visualizing and navigating a non-linear development history.
Git gives each developer a local copy of the full development history, and changes are copied from one such repository to another.
Repositories are published via HTTP, FTP, rsync (removed in Git 2.8.0) or a Git protocol over either a plain socket or ssh (Secure Shell).
According to performance tests done by Mozilla, Git is an order of magnitude faster than some VCSs.
Fetching version history from a locally stored repository can be one hundred times faster than fetching it from the remote server.
The Git history is stored in such a way that the ID of a particular version (a commit in Git terms) depends upon the complete development history leading up to that commit. Once it is published, it is not possible to change the old versions without it being noticed.
Git was designed as a set of programs written in C and several shell scripts that provide wrappers around those programs.
Although most of those scripts have since been rewritten in C for speed and portability, the design remains, and it is easy to chain the components together.
Git has a well-defined model of an incomplete merge, and it has multiple algorithms for completing it, culminating in telling the user that it is unable to complete the merge automatically and that manual editing is needed.
When backing out changes or aborting operations, certain useless dangling objects remain in the database. Git automatically performs garbage collection when enough such loose objects get created in the repository.
Git implements delta compression where a large number of newly created objects are individually compressed and packaged among themselves in a single file (or network byte stream) called a packfile.