In recent months, Newt Global Consulting migrated multiple source code repositories to GitHub. The repositories are anywhere from 10MB to few Gigabytes of source code. This blog is to share our experiences in migrating source code from SVN and AccuRev to GitHub. Most of the material below should be valid for any Git repositories.
We used 100 percent open-source tools that are freely available, and we developed the process that fits our needs. Thank you very much for the authors of these tools; without them, we couldn’t have done it. Long live open-source community and may the day never come when machines code!
This article is mainly focused on executing migration; best practices or branching strategies for GitHub are not in the scope.
Before diving into migration, here are some of the motivators we observed that are steering organizations away from traditional source code repositories:
- Managers and architects like the distributed and offline feature of GitHub.
- Enforces discipline on developer’s part (less excuses of dependency—you have local copy of everything, intelligent merging of GitHub, etc.).
- Technical-savvy groups already are using GitHub (Git Bridges).
- Developers hate the change in the beginning but they are quick learners and like to adapt with rest of the industry.
- Integration with other DevOps toolkits is a motivator (ex: Webhooks).
- Naturally fits with open-source software development (easily extend and contribute).
- Enterprise-level groups are pushing the change (playing catchup with individual development groups and the rest of the industry).
- File size limitation of GitHub is forcing the use of binary repositories and security teams like it, as they can now monitor the large open-source libraries spread across the org.
SVN to GitHub Migration
There is plenty of material on the web about the difference between SVN and GitHub. I suggest that you go through them (if you have to). Here are few links that we found interesting:
- Branching differences between SVN and GitHub: http://stackoverflow.com/questions/2471606/how-and-or-why-is-merging-in-git-better-thanin-svn
- If you are fan of SVN, check out the following for the greatness of SVN: https://svnvsgit.com/
Prerequisites
- Get read-only user access for SVN repository.
- Make sure “gitsvn” utility is available in local environment.
- Download svn-migration-scripts.jar from https://bitbucket.org/atlassian/svn-migrationscripts/
downloads - Refer to the following URL for more documentation on the above script: https://www.atlassian.com/git/tutorials/migrating-prepare/
- Decide on GitHub organization name, team name and members who will be part of the team.
- Set up the organization.
- Set up the team and assign team members.
- Create a repository to push the code from your work area.
Migration
- Extract the users information from SVN java -jar svn-migration-scripts.jar authors <SVN Repo URL> > authors.txt
- Clone the SVN Repository git svn clone –stdlayout –authors-file=authors.txt <SVN Repo URL> <GitHub Repo
Name> - Create connectivity to remote repository git remote add origin <GitHub Repo URL>
- Push the code from local to remote repo git push –u origin master
- Converting remote braches to local repo java -Dfile.encoding=utf-8 -jar svn-migration-scripts.jar clean-git –force
- Push all the changes to remote repo git push –all
Noteworthy
- We ran the scripts on Linux (case-sensitive file system); OS X has additional steps to follow (refer: https://www.atlassian.com/git/tutorials/migrating-prepare/)
- If SVN instances have local user IDs defined (not SSO), the authors file created is not of use. But you need this file to successfully to run the scripts.
- Incremental migration is possible, but we scheduled weekly jobs for full migration during the transition phase.
- Full migration takes time; try to avoid network delays. For us, our 300MB repo took around four to six hours.
- In some cases, we had to clone the SVN repo to our local file system due to access issues.
AccuRev to GitHub Migration
AccuRev is a centralized version control system developed by Borland (now Micro Focus). It uses client-server architecture and preparatory technology, and so it’s a bit tricky to extract code for full migration. We used the ac2git utility developed by Navico—the tool does the heavy lifting but took few iterations and bit of tweaking to fit our need. Most of the difficulty was in understanding how AccRev works and the relative mapping to GitHub.
Prerequisites
- Install python 3.4, Git-Bash version 2.7.4 and AccuRev
- Make sure the paths to the AccuRev and git executables are correct for your machine, and git default configuration has been set
- Clone the ac2git repo from the https://github.com/NavicoOS/ac2git
- Run python ac2git.py –help to see all the options (strongly recommend you do this)
- Run python ac2git.py –example-config to get an example configuration
- Strictly follow the steps outlined in the “How to Use” section; otherwise, you will be forced after few failed attempts
Migration
- Make an example config file:
- python ac2git.py –example-config
- Modify the generated file ac2git.config.example.xml, (there are plenty of notes in the file and it is time to run –help option if you have not done it as advised in prerequisites)
- Rename the ac2git.config.example.xml file as ac2git.config.xml
- Modify the configuration file and add the following information:
- Set accurev username & password
- Name the depot. Map each depot to single Git repository. Run the script for each depot separately
- Running the script for multiple depots to single folder overrides all depot streams into same folder. Scripts fail when given multiple folders, so always run the script for one depot at a time
- Create an empty folder and provide the complete path in the config xml file. The folder must exist and preferably should be empty
- There is no concept of having same folder name as stream name. There just needs to be an empty folder where all the contents of stream will store
- Start & end transactions which correspond to what you would enter in the accurev hist command as the<time-spec> (the keyword highest or the keyword now)
- If the start-transaction and the end-transaction are time-spec, the script will fetch the data and history only within this time period. For example start-transaction = “2013-02-07 13:41:17” and endtransaction=
“2014-02-07 13:41:17” - Use “highest” keyword instead of using “now” in end-transaction
- “now” will fetch data until the date and if there was some history deleted from workspace or not promoted from other streams than “now” keyword will not work
- “highest” keyword always look for the latest and highest commit history (preferred option)
- User mapping from Accurev to GitHub. Hint: Run accurev show -fi users to see a list of all the users
- Choose the preferred method for converting the streams
- Recommend ”deep-hist” method for sparse streams (transactions that have changed the stream contents are far apart)
- Recommend “diff” method for regular streams (when in doubt, just use “deep-hist”.)
- Run the script
- python ac2git.py
- If you encounter any trouble, run the script with the –help flag for more options.
Noteworthy
- The stream must always have some history. The script did not work for us when no history was available.
- If there are any duplicate or missing usernames, make sure to use the corresponding parameter. We ended up changing the script to handle duplicate users (this is not needed if you use the right parameters).
- In GetMissingUsers(config) method of ac2git.py code, comment last two lines
- # if not found:
- # missingList.append(user)
- Migration takes time, try to avoid network delays. For us, a 300MB repo took around six to eight hours.
- We have not tried incremental migration—we scheduled a weekly full migration and cut over all the users over a long weekend.
Conclusion
Overall, the experience was very rewarding, not only in assembling the right toolkit but process-wise as well. Every organization has its own methodology when comes to source code management and how it promotes code (branching strategy). We found small-to-medium-size enterprises have much more mature process than some of the large enterprises, though they are catching up fast.
- Don’t expect large enterprises have centrally managed source code repos (SVN hosted in some sharable desktop under manager’s desk).
- Difficult to get developers to think distributed mindset (why should I clone the entire repo?).
- Working closely with Enterprise GitHub helped in reassuring clients.
- Most of developers like to use IDE plugins than command line or web client.
- Prepare to answer backup, HA and DR-related questions.
- Enterprise GitHub comes as virtual appliance not application—prepare yourself to deal with infrastructure groups on deploying VMs into production.
- Some application groups preferred lift and shift mode than migration; don’t get shocked when they say “don’t care about history” (typically these are small dev groups).
- Integrated Slack with GitHub—this was hit not only with developers but also leadership.
- Extended Hygeia dashboard to provide team level analytics for leadership view.
About the Author / Sridhar Peddinti
Sridhar Peddinti is vice president, DevOps and Cloud practice at Newt Global Consulting. He is a 20-year technology and business leader with global experience in executing large scale projects across multiple industry verticals. He champions the consulting space helping customers with complex problems and providing them with DevOps and Cloud strategy. He has extensive hands on experience in process improvements using combination of tools kits and migrating number of legacy systems to Cloud and related technologies. Connect with him on LinkedIn and Twitter.