Why Distributed Version Control Matters to You, Today


Jan 20, 2008

If Linus's talk about git made you feel like a moron, rest assured you're not alone. Distributed version control is one of the most poorly explained topics in software, today. There are plenty of people saying that you should use it, but nobody has done a great job of explaining why. Here's my take.

WTF is Linus talking about?

First, let's talk about exactly what distributed version control is. The most common approach to describing a decentralized system is to present the reader with an image like this one, taken from the mercurial page called UnderstandingMercurial:

If that image makes any sense to you (and you're new to distributed version control), congratulations. The rest of us need a clearer description.

The best place to start is with the differences between centralized, and decentralized version control. With a centralized system like Subversion or CVS, there is a single copy of the repository; it typically resides on a server somewhere. When a developer works with the code, they receive something called a "working copy" from the central repository. The working copy contains enough information to interact with the central server, but does not contain the revision history for the project, nor the branches or tags (though, the branches and tags can often be explicitly requested).

With a decentralized system, the opposite is true. Instead of checking out a working copy, the developer works with the entire repository, including the entire revision history of the project and all the branches and tags. The copy that the developer receives is identical to the repository they fetched it from. Commits, branches, and tags occur locally, on the copy of the repository that the developer made. Changes can then be pushed and pulled to or from a public repository somewhere, another developer's repository, or anywhere else that a repository exists.

So?

Usually, when you want to get a build or source distribution of open source code, you head the project's main website. The project's repository is closely guarded, with commit rights only awarded to a select few. Anybody wishing to make changes to the source must first check out a working copy from version control, submit a patch, and hope that it is accepted.

When every developer has a full copy of the repository on their machine, the hierarchy of open source projects is all but eliminated. Any developer who wishes to work on the source code can clone the repository, commit as much code as they want to it, receive the changes from any other developer's cloned repository, and publish their work for other developers to use, or pull back into their own repositories. In a decentralized system, it doesn't matter who has the "...keys to the source repository..." (it actually says that on the rails core team page — take a look for yourself). If the original author continues to maintain the best version of the code, great; if not, users of that code can begin to pull from whoever does have the best version.

Really!

Entirely theoretical software articles are lousy — so, I always try to provide examples out of real software; this article is no exception.

Many (maybe most) rails plugins are inactive. They were created to scratch an itch, published, kept up to date for a few months, and then left with no maintainer. Since rails plugins largely reside in subversion repositories, nobody can continue development without losing the entire revision history of the project, and going to the trouble of setting up a public svn server.

Markaby is no exception to that rule. When I tried to use it in a recent project, I found that it was incompatible with rails 2.0.2. According to markaby's subversion logs, the last change was November 24th, 2007, a few weeks before that version of rails was released. Luckily, I was able to find a ticket in rails trac with instructions on how to hack a fix in to the plugin — a solution that worked great for me, but wouldn't work for a user who didn't know their way around plugins. Without commit access, normally I wouldn't be able to offer my fix publicly.

Not so with a distributed system! I was able to use git-svn to pull Markaby into a git repository. I've published my changes, and now anybody can grab a working version of Markaby by typing the following in to a command prompt:

  $ git clone git@github.com:giraffesoft/markaby.git

Even cooler than that, somebody with commit access can grab the changes, and push them back into subversion, including commit messages, and everything! Anybody who clones the git repository can still pull changes from _why's subversion repository if it ever becomes active again. If not, development can continue anywhere, and be done by anybody. That's the beauty of decentralization.