Home > Migrating from TFVC to Git (for large repositories)
Mick Philippon
10 August 2017

Migrating from TFVC to Git (for large repositories)

Migrating from TFVC to Git (for large repositories)

For a very long time, your team has used TFVC (Team Foundation Version Control) to store its source code. You were perfectly happy with it. However, some things were a burden, namely branches. And you remember the time when someone did a baseless merge and it takes four days to set things right.

Plus, you’ve heard of feature branches, of the Git Flow branch model and you want to try it.

So, you decided to switch from TFVC to Git. But how can it be done, knowing that you want to keep the whole history, that you have actually more than 25 000 changesets in your TFVC history, and that you can’t do it while people are working?

 

Step 1: tooling

There are actually three tools I’m aware of that can help you migrate from TFVC to Git: Git-Tf, Git-Tfs and the VSTS TFVC-to-Git built-in migration. So, let’s do a quick review:

Git-Tf

Git-Tf is an old tool, hosted on CodePlex. So, CodePlex is scheduled for destruction (or already down, depending on when you read this article) and the front page of Git-Tf states that it is deprecated and that we should use Git-Tfs instead. Wow, that was a quick one to exclude!

Git-Tfs

Git-Tfs is hosted on Github and claims that you can migrate your entire TFVC repository live, while keeping history. And even has documentation on how to migrate.

VSTS TFVC-to-Git migration tool

Available when you create a new git repository on VSTS (or TFS), this tool allows you a flawless migration from TFVC to Git, with one caveat: it only keeps the last 180 days of history.

So, in the end, this is a pretty simple choice: do you want to keep more than 6 months of history? If not, use the VSTS TFVC-To-Git migration tool, launch the process on Friday evening and, by Monday morning, you’ll be done. Thanks for reading!

However, if you want to keep more than 6 months, then Git-Tfs is the way to go. And in this case, this article aims at guiding you on all the little loopholes in the documentation in order to have a perfect migration.

Step 2: The initial clone

To perform the initial clone, you’ll need a computer with these prerequisites:

  • Git-Tfs, to perform the migration
  • Git, to be able to push the result to your git repository
  • Team Explorer or Visual Studio or Team Foundation Server, to be able to fetch the changesets from VSTS/TFS
  • Lots of time!

Why the fourth? Well, on a TFVC repository with 25 000 changesets, the initial clone takes 2 days when working with an on-premise TFS. And if you’re using VSTS, you’ll quickly reach the request threshold limit, which will make all the request made by Git-Tfs to be delayed a little. This is a security feature in order to avoid clobber or DOS attacks, so you don’t really have a choice but to live with it.

Conclusion? You can’t simply launch this on Friday evening, because, come Monday, it won’t be finished. The good news is that, once the initial clone is done, you can do a delta migration, which will be really quick, so you can do the initial clone whenever you want, while developers are working.

Now, let’s follow the migration guide and execute the clone command:

git tfs clone https://tfs.contoso.com:443/tfs/Collection $/project/trunk . --branches=all

Some tips and tricks around this command:

  • Don’t forget the /trunk (or /main, or whatever your main branch is)! If you do so, you’ll end up migrating your entire TFVC repository as it appears under the source control explorer, with branches being subfolder and only one branch in the Git repository. Plus, it’s way longer… So, if you want a good way to lose 3 or 4 days, go ahead, use $/project instead of $/project/Main
  • Make sure you have enough space on the disk. That sounds silly, but having to restart after migrating 70% of your repo because the disk is full is not funny.
  • Start the migration at the root level of your disk. Suppose you have a developer that likes to be really precise when naming things, and you got an folder_where_I_will_put_sql_stored_procedures_for_user_story_215/take_lines_from_bill_table_and_do_a_monthly_summary_for_the_sales_report.sql item. Well, there’s a limit on 259 characters on the path length. So, if your migration takes place on D:\migration\DevOpsTeam\TfvcToGitMigration, you’ll hit this limit and your migration is a failure. There is less chance of this happening if you start at D:\. You you can’t, use a directory junction or a symlink to shorten the path.

Step 3: Complete the clone

What? Complete the clone? But I told it to clone all branches! Well, sort of. Some branches may have failed due to complex history and merges. For example, Git-Tfs doesn’t cope well with a TFS branch that was destroyed and then recreated later with the same name. There are other cases when the all the commits are here, but the branch simply doesn’t exist. In order to get all branches, simply run this command from inside your git repository:

git tfs branch –init --all

It will proceed and create all missing branches, fetching their last changesets in the process.

Congratulations! You now have a perfect clone of your TFVC repository as a git repository! Well, unless someone checks in some code into the TFVC repository, making it a perfect clone, but from the past.

Step 4: Plan the migration and update your git repository

At this point, you have a local git repository on the computer used to perform the migration, which is not up-to-date with your TFVC repository. First thing to do, find a suitable period for the migration. For most organizations, it will be during the weekend, but you can do it during the lunch time if you want.

And once this time arrives, just follow these steps:

  1. For each branch, run these commands:
    1. git checkout branch

      This puts the git repository in the correct branch

    2. git log -1

      This allows you to get the last changeset synchronized for this branch

    3. git tfs pull -c=changesetNumber

      This updates the local git branch to the last changeset

  2. Create the remote git repository (on VSTS, Github, gitlab, …). I advise you open it only to a very small member of people. And read only access to everyone but the people responsible for the migration.
  3. Setup the git repository upstream:
    git remote add origin <a href="https://contoso.visualstudio.com/_git/repo">https://contoso.visualstudio.com/_git/repo</a>
  4. Push it!
    git push origin -u -all

And… you’re done. Easy, isn’t it? Just be aware that you can run number 1 as many times as you want. And, in fact, it is better if you run it every day from the initial clone to the migration day. This way, the work on migration day will complete way quicker.

But, why do we need to do this on a branch-by-branch basis? Why not simply run

git tfs pull --all

? Well, this is due to the difference of nature between Git and TFVC. Git represents commits as a graph, each commit having a main parent and possibly several other ones. In order to get the history of a branch, simply start from the tip and follow the child-parent relationship. In TFVC, however, check-ins are represented linearly. Each check-in is linked to a branch, but there is no such thing as a parent-child relationship or a graph of check-ins.

So, imagine you have the following branch diagram:

;

Here, C5 and C7 are not synchronized. So, if we run

git tfs pull --all

on this situation, what will happen? C7 will be synchronized, but not C5. Why? Because on the TFVC side, C6 is already synchronized, so everything before C6 is simply not considered when pulling the history from TFVC, hence C5 is not synchronized. So, we could do a

git tfs pull --all -c=5

and it will synchronize everything. In this simple case, it will work. But it will review C6 and check that it is already synchronized. So, while it is a viable option, on a complex repository with many branches, it is simpler and quicker to do it branch by branch rather than reviewing all branches to check for the oldest non-synchronized check-in and waiting for git-tfs to check all the already synchronized check-ins.

Step 5: orphan branches

In TFVC, you can add a folder, fill it with some items, checking it in and transform it into a branch. The result is a branch that has no common ancestor with the main branch or any other branch in your code. Well, while it is possible in Git to have branches with totally different and unrelated contents (go see how GitHub handle the wiki on each page, for example), they will always have one common ancestor (the initial commit).

When you translate your TFVC repository to Git, you’ll find that these unrelated (to main) branches are simply absent from Git. And there’s nothing you can do with it, unfortunately.

There is a solution, which is to have several git repositories. One for the trunk and everything related, and one for each unrelated branch (and their derived branches). So, basically, if you are in this case, you’ll have to create another Git repository, translate the root of your unrelated branches tree to this git repository, following the exact same procedure that you did for the trunk, which, functionally, would be pretty logical.

Step 6: locking the TFVC repository

Until now, your team has been working on the TFVC repository. Let me rephrase that: until now, it is very important that your team works exclusively on the TFVC repository. After some tries, it was discovered that, while simple committing directly on the Git repository didn’t hamper the capacity of Git-tfs to synchronize the TFVC repo, once you start playing with branches (creating, merging, removing them, rebasing some commits, …), you enter merge-hell. Which is not a happy place to be in.

Now that the git repository is prepared, fully synchronized, that you verified your quality tools and continuous integration process runs fine using the Git repository instead of the TFVC one, you can do the switch. One important step is to close the TFVC repository, because once the team begins to work on git, synchronization will become really hard.

To do that, go to the TFVC repository security settings, and remove all rights but ‘read’ to each and every group listed. Yep, even administrators. If a change is needed on this repo, you can always reopen it temporarily, but at least this way, there won’t be any ‘Oops! I worked on the wrong repo, can you report my changes?’ request.

Likewise, if you locked down the Git repo during the migration to avoid unwanted tests, open it to your team.

Step 7: relax, sit back, have a beer (or any beverage you fancy) and enjoy!

Oh, and don’t forget to send an email to all the users to tell them the switch has been done and the URL of the git repository. You may want to include the exact git clone command to run to facilitate the adoption.

Livre Blanc Cell'insight 1 DevOps

This posts should interest you
Comments
Leave a Reply

Receive the best of Cloud, DevOps and IT news.
Receive the best of Cloud, DevOps and IT news.