How Git ate the world
Take a step back to 1995. Your two options for SCM are CVS and ClearCase. CVS is free and, feature-wise, worth every penny. ClearCase is incredibly expensive but powerful: it can handle real merges (up to a 64-way merge!), global development teams, and software projects with multiple modules.
Now Perforce enters the picture. It isn’t free, but it’s much cheaper than ClearCase. It’s not as powerful as ClearCase, but it’s relatively fast and gets the job done. And that’s the recipe for a successful commercial SCM product. Indeed, as ClearCase slowly fades away and Subversion stagnates, a few years ago Perforce seemed ripe for wider adoption.
Fast-forward to the present. Git is now the top SCM tool for software developers. What happened?
Git is distributed: every developer has the full history of their code repository locally. This makes the initial clone of the repository slower (unless you are using Smart Mirroring), but subsequent operations such as commit, blame, diff, merge, and log dramatically faster.
Perforce, for the most part, requires a connection to the server in order to even see the history of changes. And that single central server becomes a bottleneck as teams and projects get bigger. Commands like viewing history (p4 changes), creating a tag (p4 label or p4 tag), making a branch (p4 integ), or even making a file writable in your workspace (p4 edit) require write access to the server – which is an obvious bottleneck when thousands of users are accessing that server.
Perforce, although it no longer publishes pricing, is known to be in the range of several hundred dollars per user for purchase and a percentage of that for annual renewals. For larger teams, it can also require fairly expensive hardware for that big central server.
Git by itself is open source and completely free. Bitbucket Server, which offers technical support and on premise installation, is a fraction of the cost of Perforce.
Take a team of 50 developers. Bitbucket would cost $600 per year compared to tens of thousands of dollars for Perforce. That adds up to a lot of free lunches for hard-working hackers.
Putting aside all the bells and whistles, fundamentally an SCM tool is about collaboration: letting a team of developers work on a shared set of software files. Git offers simple and computationally inexpensive branching, which opens up the door to a variety of cool workflows. Task branching, Git Flow, forked repositories – there’s a fast and easy workflow for any type of team from open source to professional development, aided by powerful code review and collaboration tools.
Git also makes it easy to collaborate across company boundaries, a common requirement in cross-functional development. Even if physical network access to a Git shared repository is not possible, Git patch and bundle tools make sharing data simple.
Perforce, on the other hand, maintains a branching record on a per-file basis, compared to a per-commit basis with Git. What does this mean? Well, for starters it creates an awful lot of metadata in the Perforce database every time you make a branch. That contributes to performance problems at larger deployments, to the extent that many Perforce administrators restrict branch creation.
Consider that for a moment: every time you want to make a task branch to try out a new feature, you’ve got to go and ask permission. If you can’t make task branches, you might check in unstable code on the main branch, or just wait until you’re “done” before committing at all. You sacrifice the benefit of having CI/CD on your task branches and being able to track granular work-in-progress. The end result is reduced productivity as developers either live with less productive workflows or just start using Git on the side and figure out how to manually merge their work back to Perforce.
Besides being expensive, Perforce branches aren’t conducive to the type of workflow most developers prefer. Perforce branches are shared, so there is no such thing as a private task branch with periodic rebasing. And Perforce’s merge algorithms are overly complicated, with entire articles written about how to merge files that were renamed or had their attributes modified.
And sharing code between Perforce servers? You’re back to sharing tar files with no common history. Perforce’s data model thinks of software history as being unique to a single server, compared to Git’s easy ability to clone and share history everywhere.
Mind share and community
Putting aside commercial competitors, why did Git beat out Mercurial and other worthy competitors? There is some value in momentum of course, and Git has it. Git was created by Linus Torvalds to solve the distributed development challenges of the Linux kernel project, and now is the standard SCM tool for Linux, Android, OpenStack, and most other significant open source projects. It’s what all the cool kids are using – so if you’re a hiring manager, you can probably assume that a new engineer can (and will want to) work with Git without requiring extensive training.
And, of course, you have the full power of a vibrant open source community standing behind Git. Git is evolving rapidly to solve real-world problems, with major new features like Git LFS arriving on the scene. You can contribute your own code to the Git project if there’s a bug you want to see fixed, and you’ll never be locked into a commercial product with a roadmap and pace set by a single company. Just look at the range of Git client programs available: several powerful desktop GUIs, Windows Explorer integration, plugins for every IDE and developer tool.
GUIs and developer tools
In the original days of Git, the GUI and tool support was somewhat lacking. This was a stumbling block for users who prefer a visual interface for interacting with their Git repositories. Non-technical collaborators such as game artists were particularly disenfranchised. Perforce’s Windows Explorer plugin was a hit with this audience.
But thankfully those days are past. GUIs like Sourcetree offer a point-and-click experience and there are a multitude of shell integrations for Git. Bitbucket provides code review, merge and pull requests, forking, online code browsing, and a plethora of other collaboration tools. Indeed, everyone from data scientists to creative agencies are organizing communities that make use of the open collaboration that Git and Bitbucket make possible.
Game developers are special
So that being said, what’s stopped some communities like game developers and researchers working with huge data sets from jumping on the bandwagon? It all boils down to the type of data and the complexity of the project organization.
Game developers, particularly artists, need to work with large binary objects like textures and audio assets. Data scientists may have massive data sets comprising billions of event samples.
That poses two problems for Git.
- These files can’t be merged. A centralized locking mechanism is handy, and Perforce offers one. (Note however that even a centralized server only offers a locking mechanism on a single branch, so relying on this feature implies that you had a very restricted workflow.)
- These files cause Git to slow down as the size of the repository grows.
The repository size problem is largely addressed by Git LFS, an extension that lets Git handle large files while delegating the actual file storage elsewhere.
The problem of file locking bears examination on two fronts. From a software configuration management perspective, Git LFS has a superior breed of file locking on the roadmap. Git LFS will help coordinate locking binary files across multiple branches with an algorithm that makes sure you’re working on the latest version, no matter which branch you’re on. That opens up branching workflows to users working with large binary files, compared to Perforce’s single-branch locking model.
It is also useful to think about file locking as a coordination problem. If you’re going to start working on a shared asset that can’t be merged, how do you broadcast that knowledge to all interested parties? Again, here’s where the advent of modern workflows using pull requests and real-time team collaboration really shines. You can quickly communicate your intentions using HipChat and check to see if there’s any outstanding work in progress on a particular file.
It’s also interesting to consider how the problem of handling large files will evolve in the era of Big Data. In order to test a Big Data analytics job, you may need a data set that’s several terabytes in size. Forget about any SCM system – this project is tested and run on a Big Data-compatible file system. What’s needed here is a CI/CD system that can orchestrate a more complex pipeline with artifacts living on HDFS or S3. That leads to our next topic.
Game development is a classic example of a software project with multiple modules or components – the game engine, the UI, static art, video renderings, and so on. Perforce as a monolithic centralized repository can host all of these modules in a single server, and let users choose which parts to pick into their own workspace.
However, this advantage is largely moot now. Modern Git systems like Bitbucket provide easier management of Git multi-module tools like submodules and subtrees. And more importantly, large projects like Android have shown how to manage a complex project using higher level composition tools. Many of these lessons have been pulled into modern CI/CD tools like Bamboo and Bitbucket Pipelines, which can orchestrate complex continuous integration workflows, model the dependencies between projects, and manage artifacts between projects.
This trend largely follows the Git (and *nix) philosophy of building a tool that does a single job very well. Continuous integration and continuous delivery (CI/CD) is a practice of its own, with tools that are dedicated to understanding build and release workflow. It also aligns with modern software development best practices, which aim to use small self-contained microservices rather than monolithic projects.
There’s clearly some momentum in the “Perforce to Git” camp, and Git and modern CI/CD tools are now poised to handle the largest and most complex development efforts. Indeed, Perforce even made a tool called Git Fusion that lets you extract part of a central Perforce repository as a Git repo.
Unfortunately, while Git Fusion was a noble effort, trying to layer Git onto a centralized SCM system isn’t very easy; if you attempt to mix your usage models, you can quite easily corrupt one system’s view of the data. If you don’t mix your usage models, it’s hard to see the value of putting a commercial centralized backend behind Git. The trend as we’ve seen is actually in the other direction: how do you put the last few remaining pieces of centralized SCM that were useful into Git?
If you’re using Perforce for any software or game development, you’re probably wondering (nervously) about how to migrate to Git. How do you even do that? And is it worth the switching cost? That’s exactly what we’ll cover in the next article.