Everyone knows Git. It’s a favorite developer tool and for good reason. Developers love its flexibility, speed and ability to branch locally and in place. It also appeals to the attraction to community development.
But as developers adopt Git within enterprises, IT managers face the challenge of reconciling the needs of developers with the needs of the larger organization, especially when it comes to DevOps. In this article, we will examine some of these challenges and suggest best practices that can alleviate them.
Performance
IT organizations often rely on performance and low latency as key indicators of the adoptability of any solution. Speed tests run against Git’s network operations generate impressive results, since the Git protocol for transferring data is highly optimized. Over time, however, an internal implementation reveals limitations that can impact the speed and efficiency of the delivery pipeline.
First, Git works very well for small projects. Yet, when projects grow larger, many teams will feel the pinch of its all-or-nothing architecture. A developer who needs to work on a project must retrieve (or “clone”) every file associated with it. For enterprise software, this necessity is not always feasible or desirable. Large projects can require a lot of disk space and time to clone, even though much of the repository content may not be relevant to that developer. The all-or-nothing approach of a Git clone also compromises security—users will have the same access to the whole repository, whether they should have it or not.
Second, Git is optimized for fairly small assets, such as code and configuration files. But many organizations need to version large binary files such as images, videos and CAD designs. Large binary files are also often the product of the build system. Such content can quickly bloat the size of a repository and, by extension, the file systems of contributing developers. A less obvious issue with large files is the fact that some Git functions require the calculation of hash values over the contents of the repository. As a result, large binary assets can inexplicably reduce some operations to a crawl while others execute almost instantly. To address these issues, you should employ an external store for your large binary assets. There are no formal standards for external asset stores, but there are options available, including git-annex and Git LFS.
The solution that Git offers is to separate components into dedicated repositories or to use Git submodules. Both approaches lead to what is known as Git sprawl, a condition that increases complexity and risk. Git sprawl can be avoided using the narrow cloning approach. Narrow cloning enables developers to retrieve just the subset they need from the contents of a code repository. Although the developer community has long requested this feature, Git currently provides no support for narrow cloning. For these reasons, you should standardize on a Git management system that offers narrow cloning, while using shallow cloning as much as possible to limit file revisions.
As long as repositories are logically divided along component lines, developers might not experience any immediate problems with Git sprawl. Build and test teams, by contrast, are likely to feel the pain immediately. Trying to find all the correct and related revisions of files from many different repositories is an expensive and risky process. Builds will frequently fail if the input files are not correct. The ideal agile workflow supports build and test on every check-in, especially on the mainline. If builds frequently fail, the build and test system will not be able to keep up with the development pace and will slow down the entire DevOps pipeline.
Another bad practice that can result from Git sprawl is having inconsistent or conflicting branching models for different teams or repositories. Dealing with the idiosyncrasies of individual developers or teams is needlessly frustrating and inefficient for the build and test teams. But Git provides no support for defining or enforcing branching strategies across multiple development teams. For this reason, it is critical to define an overall branching strategy, socialize it throughout the organization and review practices on an ongoing basis.
Hosting
Hosting options for cloud deployments of Git are plentiful. Organizations embracing Git need to evaluate these options carefully because the choice will impact not only information security but also DevOps. Developers generally seek options that make it easy to create new projects, clone or fork and push their work, preferably with great review tools and a simple method to deliver units of work to the master branch.
The developer’s desire for ease of use is often at odds with DevOps’ need for scalability. Available Git management systems vary significantly in their capacity to support enterprise IT, with capabilities ranging from single-server topology all the way up to clustering and, in some cases, high availability. Note that none of the solutions supports disaster recovery. Therefore, organizations need to identify tools for backup and recovery, while also defining internal processes to ensure the system can be restored after a catastrophic event.
Security
Developers’ need for speed and customizability also often run counter to the enterprise’s need to protect the intellectual property locked away in Git. The key issue is that Git lacks a robust and manageable mechanism for fine-grained access control. Git is limited to providing security in the form of authentication; users must present credentials (username/password or SSH key) to access private repositories. Once the user has accessed the repository, Git has no way of controlling what that user does. While this works well for democratized, community-based open source development, it creates security holes and compliance problems for enterprises.
For these reasons, it is critical to think through the necessary roles and permissions that will enable developers and DevOps to do their jobs, while minimizing the risk incurred by hosted Git solutions. Personnel should always be provided with “least privilege” access to assets—that is, they should be given only the permissions necessary to complete their tasks. For compliance purposes, it is often necessary to provide logs showing who did what when. This requirement is especially true for enterprises in regulated industries such as health care and finance. Git, however, gives users the ability to destroy virtually every form of historical evidence. Again, this is useful for developers, but detrimental to information security and governability.
In summary, DevOps is penetrating enterprise IT departments at the same time that Git management solutions are proliferating. The differences among competing Git management solutions become most apparent in the context of DevOps. Remember, DevOps has to manage everything across the application lifecycle. Evaluating Git solely from the perspective of development teams, without considering the requirements of DevOps, can be a recipe for the failure of your DevOps initiative.
About the Author/Mark Warren
Mark Warren is Product Marketing Director at Perforce, based in the United Kingdom. Mark has more than 25 years of experience working with development and product management teams in software development tools and configuration management.