GitHub’s platform is the largest host of source code in the world, hosting over 190 Million repositories. So much of the code we rely on every day is hosted there. You are, in all likelihood, using it. But what would you do if, one day, all of the project code GitHub stored for you disappeared?
Just like all other cloud services, GitHub follows a cloud computing principle called “shared responsibility” for data protection and security. It essentially means that users of a cloud platform and providers of said platform split the responsibility for safeguarding data. In other words, users are just as responsible for mitigating risks on their side of the equation as GitHub is on theirs. It’s all laid out in GitHub’s terms and conditions:
Taking this into account, there are a handful of things that can impact your ability to access to data in GitHub. Here are a few:
Account Compromises
In July of 2019, Ubuntu Security reported that the credentials for a company-owned GitHub account were compromised. These compromised credentials were used to create repositories, issues and more.
In this case, critical infrastructure was decoupled from GitHub, and the breach wasn’t allowed to spread. However, Ubuntu had to restore various repositories and issue trackers to their previous state. When rolling back from an account compromise like this, backups are infinitely helpful, as they give you a previously good state to compare with the current state.
Additionally, attackers often leave backdoors in compromised codebases. This can allow them to gain deeper access once the initial discovery and remediation process is completed. If you only have your infected codebase, it can be challenging to uncover all the infected files or identify possible vectors for future attacks. Restoring to a previous version of your code eliminates that threat.
Ransomware Attacks
Ransomware is the act of taking control of data and encrypting it so that only the attacker can unlock it. The attacker will usually ask for a ransom to unlock the files; otherwise, they may delete the data.
In May 2019, ZDNet reported on a ransomware attack in which a hacker held various repositories hostage for a fee. The hackers modified the git histories to the point where the repositories were unusable, and they demanded payment within 10 days to reverse the changes.
These attacks can create a severe disruption in operations. Developers will be unable to commit code, creating a complete stoppage in new feature development. Bug fixes and even support tickets (if managed through GitHub) could also be affected. However, restoring from a backup would allow a business to continue working.
Service Downtime
Depending on a third party always carries some risk, and GitHub is no exception. No service can deliver 100 percent uptime, but when your entire business (or codebase) depends on GitHub’s availability, you might want to mitigate that risk by having your own backups.
For example, in June 2020, GitHub experienced a major, hours-long outage before stability returned. So, if all your work was stored in GitHub, you would have had to wait for access to be restored. This downtime can be devastating if it occurs during a crucial launch window.
These are just three examples, and they happen more often than you think. A recent report by Oracle and KPMG found that 49 percent of IT professionals could attribute recent data loss to their failure to safeguard data. That’s essentially the same odds as a coin flip. Having continual access to all your work shouldn’t be left to chance. That’s why you need a backup strategy.
Strategies for Backing Up Data in GitHub
- Managing Your Own Backups: This means you are responsible for all the infrastructure, business processes and ongoing repair costs to create the backups. You might think this will be a more cost-effective option, but the ongoing labor and maintenance expenses tend to add up quickly. It also means your team may use up cycles on something that is not part of the core business. What you make up for in control, you lose in time and spent resources.
- Using a Third Party to Manage Your Backups: Sometimes referred to as Backup-as-a-Service (BaaS), this involves outsourcing backup management responsibilities to a separate company. It removes the responsibility from your business, but it might seem more expensive upfront. In most cases, there is nothing to do after choosing a provider. They manage the entire process, from cradle to grave. This includes any API updates (which can happen often), implementation and ongoing maintenance. The drawback is that you lose control. Terms of service can change, or the data that’s backed up could change. And, not all BaaS solutions are transparent in what they do and how they access data.
So, What’s the “Right” Choice?
For most businesses, the sheer amount of work required to build and maintain backup software is a non-starter. Development cycles are too valuable to tie up with work not directly supporting the business’s product roadmap. You may get to a size where you think adding this competence makes sense; however, even some massive multinational corporations use some form of BaaS.
Just make sure you do your research. Read reviews, speak to their sales or development team and confirm that they know what they are talking about. You need to determine if they have built a credible product. For many PaaS and SaaS tools, third-party backup applications are built by faceless companies with no track record. Considering the level of access they will have to your data, you want to ensure they are reputable with a proven history of successfully building backup software.
Regardless of the method you choose, having a backup strategy in place is vital. Due to the nature of how GitHub works and how important your code repositories are to your business, the sooner you get something in place, the better.