Migrating from Perforce to Git
As we discussed in the previous article, Git is now the de facto choice for SCM for just about any type of digital development. But if you have years of valuable history stored in Perforce, you are probably weighing the cost of switching. In this article we’ll tackle those concerns head-on, and tell you how to migrate data to Git. We’ve broken the Perforce to Git migration process down to 8 steps:
- Moving Perforce data
- Mapping users and permissions to a new Git repo
- Large binary files
- Complex dependencies
- Structuring your team during the migration
- Mirroring data
- ALM Tools
- How to define success after a Perforce to Git migration
Step 1: Moving Perforce data
There are two general approaches for moving the data over from Perforce to Git. Before we dive into that area, we need to consider a fundamental difference between how Perforce and Git handle software projects.
A Perforce server can hold tens or hundreds of distinct software projects, each with its own branching model. A developer defines a “view” that tells the Perforce server which files to put into a working copy. A Git repository on the other hand normally holds a single software project and its branches and tags (although large monolithic Git repos do exist). You typically clone the repo and, perhaps, check out submodules or subtrees.
The question of moving the data, then, has two parts: how to extract data from Perforce, and how to translate that into an equivalent set of Git repositories.
Moving Perforce Data Option 1: Using Git Fusion
If you want to preserve the entire history of your data in Perforce, you can use Perforce’s own Git Fusion tool to extract a section of a Perforce server (a single project) into a Git repo. Essentially, you:
- Install Git Fusion
- Set up the correct views of your data, including the branching structure
- Use any Git client to clone from Git Fusion
- Push your repo into Bitbucket
Hands-on example *In order to work through this example you’ll need a Perforce server with Git Fusion already operational.* Let’s say that you have a Perforce project living in the repository path //depot/acme/… (in Perforce depot view syntax). It has three branches: - //depot/acme/main/… - //depot/acme/r1.0/… - //depot/acme/r1.1/… Keep in mind that with Perforce you see branches as additional directories in the tree. Your first step is to configure Git Fusion so that it understands the branching relationship in Perforce. To do this, you create a repo configuration file: [@repo] description = Acme project charset = utf8 [master] git-branch-name = master view = //depot/acme/main/… … [r1.0] git-branch-name = r1.0 view = //depot/acme/r1.0/… … [r1.1] git-branch-name = r1.1 view = //depot/acme/r1.1/… … Submit this file to Perforce under the path //.git-fusion/repos/acme/p4gf_config Now create an empty project called acme in Bitbucket using the normal Bitbucket administration tools. You can configure the access control and team members per your usual standards. Next, clone from Git Fusion: git clone https://<git fusion server url>/acme cd acme git remote add bitbucket <bitbucket project URL> git push –u --all bitbucket git push --tags Bitbucket That’s it! You should now see the imported history in Bitbucket.
Now, this may not always give you a 100% faithful copy of your Perforce data. There are some Perforce operations, like partial merges, that just have no equivalent in Git. But all in all, this method will get most of your history without too much effort.
Keep in mind that preserving the last 10 years of branching history from a legacy SCM doesn’t mean that you have to keep using the same workflow. Notably, you should consider adopting feature branch workflows like Git Flow as a practical first step.
Pros and cons
- Requires the most setup work and runtime
- Preserves the most history (letting you shut down legacy Perforce server)
- Maintains legacy branching model in history
Moving Perforce Data Option 2: Start over
The other option is to start over. Forget all that crufty history: just extract the head (tip) of each branch in Perforce that corresponds to your project, and check that stuff into a new empty Git repo. (This implies that you have Perforce workspaces defined with a correct ‘view’ of the data you want.)
This is the simplest and fastest technique. No matter how complicated your Perforce history was, your new Git repo is lean and mean. You get the chance to start a new Git-based workflow without any accumulated baggage.
The main drawback is that you probably want to keep the old Perforce server around in a read-only mode in case anyone needs to dig into historical code for any reason. This won’t cost you anything in license fees but it does imply that you’re keeping that old server alive for a while.
**Hands-on example** Go into your Perforce workspace (the directory where the master branch of your project data is checked out) and run: p4 sync This fetches the latest revision of your files. Now create an empty project called acme in Bitbucket using the normal Bitbucket administration tools. You can configure the access control and team members per your usual standards. Next, create a new Git repo in your workspace and push to Bitbucket: git init . git remote add origin <bitbucket project URL> git push –u --all origin git push --tags origin You should now see the latest snapshot of your code as the first commit in your new Bitbucket project.
Pros and cons
- Fast and simple
- Redesign branching model and workflow
- Legacy Perforce server used for read-only access
Step 2: Users and permissions
After the data is moved over, the next task is usually to start mapping your users and permissions into new Bitbucket projects. If you use LDAP for a user directory you’ll save some time here. Otherwise, you can easily extract a set of user accounts from Perforce using the p4 users –o command and then enter them into Bitbucket a project at a time.
Translating Perforce permissions into the equivalent Bitbucket permissions can be difficult because Perforce permissions are granular and complex, with the possibility of excluding access to individual files. This complicated permission scheme is one reason why a Perforce server can bog down – every attempt at access may cause the server to perform an expensive computation on a complicated data structure.
In most cases it’s faster just to ask project leads to define a simpler set of permissions in Bitbucket using the normal project, repo, and branch level permissions. Indeed, you’ll want to revisit your permission setup anyway, as Git offers up so many new workflow options. For example, in Perforce you may have restricted branch creation, while in Bitbucket you may only need to restrict push access to the master branch.
Step 3: Binary files
If you stored large binary blobs in Perforce, think carefully about how you want to manage those in Git. You could try out Git LFS, or you could simply use a regular artifact management system instead. In any case you don’t want to blindly push large blobs into a Git repo.
Step 4: Complex dependencies
A Perforce working copy may actually map in read-only copies of data from several modules. In Git, this is done either using submodules, subtrees, or by leveraging CI/CD or artifact management systems. There’s no easy answer here, but some data import tools can model a submodule relationship between Git repos. For a more in depth look on how to use submodules or subtrees, you can read about each here: https://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/.
Step 5: How to structure your team during the migration
So, your Perforce server has 100 projects from 10 teams. You’ve got a migration strategy and tool set laid out. Schedule the maintenance window and go!
Remember that switching SCM tools is as much about developers as it is data. You’ve got people, process, and schedule to consider – don’t try to boil the ocean in a single day. It’s too risky.
You need to consider a project plan during the actual migration phase. (It might be a good time to try out a new Jira workflow…) Here are some options you can look at.
- Migrate team-by-team and project-by-project. Aim to start a project and team at the beginning of a sprint or program increment, when you have some time to adapt.
- Migrate incrementally. Import all of your data in a weekend, but then let teams slowly complete the switch to Git over time. Periodically pickup the deltas by re-running your import tools. Although more complex, this strategy isn’t bad if you have dependencies between teams and the early adopters need at least a recent snapshot in Git to feed their CI/CD pipeline.
- Use both systems at the same time for a period of time. While not for the faint of heart, it’s technically feasible to use Git Fusion to do a two-way data exchange as long as you are not doing complex operations that will confuse the data translator.
Lastly, invest in communicating the changes to the team – the motivation, the why, and a series of steps for how to do it. Pick an “early adopter” team with engineers experienced in the entire software development lifecycle, and have that team be a model for the others. Find Git champions to assist people when they have a difficulty. Making small, understandable, iterative changes will help this process be successful.
Step 6: Mirrors and Clusters
Perforce has a simple but effective system for mirroring data to remote sites to reduce the effect of latency. It has a more complex system for running a set of local mirrors for read-only clustering. Although latency is simply not as much of a concern for Git, if you are running a worldwide operation you should look at Bitbucket Data Center for both clustering and mirroring, which will greatly speed up your clone times for a global team.
Step 7: ALM Tools
And now for some good news – you’ve got a lot of choices for your ALM tool stack when you move from Perforce to Git. Pretty much every developer and ALM tool out there works with Git, and of course Bitbucket gives you great integration with Jira and Bamboo. As you transition to Git, you can explore Bamboo features such as Plan Branches that take advantage of a feature branch workflow.
Step 8: Defining success
So how exactly do you measure success during a migration from Perforce to Git? In many migration projects we tend to focus too much on the fidelity of data transfer. But that is not a useful metric for many reasons. It’s likely that you can never get a bit-for-bit history in Git that is exactly the equivalent of what happened in a centralized SCM system like Perforce.
A more practical approach is to use CI/CD for verification. Once you switch your CI/CD pipeline from Perforce to Git, do all your tests still pass? And can you still deploy your software? If all of your important older builds can still pass through your CI/CD pipeline, then it’s time to declare victory!
That’s a wrap
So now you’ve seen why there’s movement from Perforce to Git, and how to actually get there. The next step is to choose a Git solution. If you are switching from Perforce for game development, see why game developers love Bitbucket.