DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • npm is Scam-Spam Cesspool ¦ Google in Microsoft Antitrust Thrust
  • 5 Key Performance Metrics to Track in 2023
  • Debunking Myths About Reliability
  • New Relic Bets on AI to Advance Observability
  • Vega Cloud Commits to Reducing Cloud Costs

Home » Blogs » How Bazel and GitHub Can Fix the Dependency Availability Problem

How Bazel and GitHub Can Fix the Dependency Availability Problem

Avatar photoBy: Jay Conrod on March 16, 2023 Leave a Comment

Recently, GitHub upgraded the internal version of Git they use to produce repository archives. You’ve probably used these archives before if you’ve downloaded a .zip or .tar.gz file from a repository at a particular version. GitHub produces those archives on demand using Git archive and caches them for a short time.

Upgrading Git regularly is a generally good idea, but this change regrettably broke a huge number of Bazel projects. Bazel is a widely used, free, open source tool for building and testing software, so this breaking change is a big deal. What happened? Most Bazel projects fetch at least some of their dependencies using rules in their WORKSPACE files like this:

TechStrong Con 2023Sponsorships Available

http_archive(

name = “com_github_bazelbuild_buildtools”, sha256 =

“05eff86c1d444dde18d55ac890f766bce5e4db56c180ee86b5aacd6704a5feb9”, strip_prefix = “buildtools-6.0.0”,

urls =

[“https://github.com/bazelbuild/buildtools/archive/refs/tags/6.0.0.ta

r.gz”],

)

See that /archive/refs/tags/ part of the path? That’s the endpoint I’m talking about.

This is bare-bones dependency management: Bazel attempts to download an archive from the first URL in the list; it tries the next URL if the first is not available and so on. Bazel then checks the file’s SHA-256 sum against the known value and, if it’s correct, extracts the archive and proceeds with the build.

The Git upgrade caused a change in archives’ SHA-256 sums. I think there was a small change in zip compression, but it doesn’t really matter—any variation in file ordering, alignment or compression causes the archives’ SHA-256 sums to change even though the extracted contents are the same.

This is at least the third time Bazel builds have broken that I can remember. This has also been discussed extensively before. I’m writing this in the hope that we can make our systems more resilient and avoid these kinds of problems in the future.

Since GitHub made the change that triggered this, they naturally get the immediate blame from the community, though I think it’s mostly undeserved. Upgrading dependencies (especially Git and especially if you’re GitHub) is a reasonable thing to do. To my knowledge, GitHub has not documented a guarantee that files returned by the archive endpoints have stable SHA-256 sums. It’s a mistake for users to rely on a guarantee that was never made. It’s tempting, of course, because it’s easy, but it’s a mistake nonetheless.

This is a classic example of Hyrum’s Law:

“With a sufficient number of users of an API, it does not matter what you promise in the contract: All observable behaviors of your system will be depended on by somebody.”

Since these updates have broken Bazel (and presumably others) a few times now, I’d really like to see GitHub clarify in documentation whether users should or should not depend on stable archive SHA-256 sums. A GitHub engineer commented that this is not stable, but product managers and support engineers have commented at other times that is stable. I don’t really think discussion comments count since they’re not discoverable. Only official documentation is authoritative.

I haven’t actually found any documentation for these release archive URLs, so I’m not sure where this clarification should go. It’s not part of the REST API. Linking to releases is pretty close.

If archive SHA-256 sums are guaranteed to be stable (now or in the future), I think documenting and testing that would let us all sleep easier at night.

If archive SHA-256 sums are not guaranteed to be stable, it wouldn’t be a terrible idea to inject a little chaos to prevent people from depending on them. For example, in Go, the iteration order of elements in a map is undefined. To prevent developers from depending on iteration order (and tests from breaking when the hashing algorithm is tweaked), the Go runtime adds a random factor into the hashing algorithm, so the iteration order is different every time a program runs. Something similar could be done here with archive file order or alignment. I wouldn’t suggest gratuitously breaking this API, but if it needs to change anyway for some reason in the future, it would be a good idea to add something like this.

What Could the Bazel Community do Better?

Bazel developers should not rely on stable archive SHA-256 sums unless that stability is guaranteed and documented by GitHub. More importantly, developers should not rely on dependency artifacts being available on GitHub at all: A library author could delete their project at any time.

I’ll point to Go modules as a model of a great dependency management system designed to solve this exact problem. The Go team operates proxy.golang.org, a mirror for all publicly available Go modules. Internally, the proxy stores actual files for each module and does not need to regenerate them. The proxy protocol is open and easy to implement as an HTTP file server, so you can run your own proxy service for better availability. I’d love to see something like this happen for Bazel, especially if it’s operated by Google. It is not technically difficult to build a service like this, but there are a lot of thorny issues around handling abuse and legally distributing software with unrecognized licenses, and Google has already figured out those issues for Go.

Until such a service exists, developers can protect themselves by copying their dependencies to their own mirror. A GCS or S3 bucket works fine.

Library authors can and should protect their users by providing static release artifacts (not dynamically generated archives), and mirroring those. For example, check out the http_archive boilerplate for rules_go:

http_archive(

name = “io_bazel_rules_go”,

sha256 =

“dd926a88a564a9246713a9c00b35315f54cbd46b31a26d5d8fb264c07045f05d”, urls = [

“https://mirror.bazel.build/github.com/bazelbuild/rules_go/releases/d ownload/v0.38.1/rules_go-v0.38.1.zip”,

“https://github.com/bazelbuild/rules_go/releases/download/v0.38.1/rul es_go-v0.38.1.zip”,

],

)

The file rules_go-v0.38.1.zip is created by the rule authors and attached to the release; it’s not dynamically generated.

It’s also copied to mirror.bazel.build, which is a thin frontend on a GCS bucket shared by many rule authors in the Bazelbuild organization.

One other tip: If you’re feeling adventurous enough to use an experimental, undocumented feature (to make your build more stable! Really!), you can configure Bazel’s downloader to rewrite those GitHub URLs to point to your own mirror.

Aside: SHA-256 of archives or contents?

It’s unfortunate that a change to the Git archive that does not affect extracted contents of an archive can still change its SHA-256 sum. Bazel absolutely does the right thing by checking the sum of the downloaded file before extracting its contents.

This is the (delightfully named) Cryptographic Doom Principle. If Bazel only authenticated the contents of an archive, it might be possible for an attacker to exploit a vulnerability in Bazel’s zip parser before the archive is authenticated. Since Bazel authenticates the archive before extracting it, the pre-authentication attack surface is very small.

Closing Thoughts

When you’re designing software, think carefully about how it’s going to be used. If there’s a right way and a wrong way to do something, make sure the right way is easier and more obvious. Better yet, make the right way the only way.

I think this is a case where Bazel’s dependency management is too limited: To use http_archive safely, you need to set up an HTTP mirror with copies of your dependencies. That’s too much work for users, especially new users who aren’t aware of the hazards. A more complete dependency management system should include an artifact registry or a read-through caching system with at least one public implementation. I was hoping Bazel modules and the Bazel central registry would provide that, but the central registry only includes module metadata: Module content is separate, specified in URLs that still frequently refer to the unstable GitHub endpoint.

Related Posts
  • How Bazel and GitHub Can Fix the Dependency Availability Problem
  • GitHub Extends Scope and Reach of Repository
  • DevOps Deeper Dive: Git Turns 15
    Related Categories
  • Blogs
  • Business of DevOps
  • Continuous Delivery
  • DevOps Toolbox
  • Doin' DevOps
    Related Topics
  • Bazel
  • dependency
  • github
  • open source
  • repository
Show more
Show less

Filed Under: Blogs, Business of DevOps, Continuous Delivery, DevOps Toolbox, Doin' DevOps Tagged With: Bazel, dependency, github, open source, repository

« How Open Source Can Benefit AI Development
DevOps Adoption in Salesforce Environments is Advancing »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

https://webinars.devops.com/overcoming-business-challenges-with-automation-of-sap-processes
Tuesday, April 4, 2023 - 11:00 am EDT
Key Strategies for a Secure and Productive Hybrid Workforce
Tuesday, April 4, 2023 - 1:00 pm EDT
Using Value Stream Automation Patterns and Analytics to Accelerate DevOps
Thursday, April 6, 2023 - 1:00 pm EDT

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

npm is Scam-Spam Cesspool ¦ Google in Microsoft Antitrust Thrust
March 31, 2023 | Richi Jennings
5 Key Performance Metrics to Track in 2023
March 31, 2023 | Sarah Guthals
Debunking Myths About Reliability
March 31, 2023 | Kit Merker
New Relic Bets on AI to Advance Observability
March 30, 2023 | Mike Vizard
Vega Cloud Commits to Reducing Cloud Costs
March 30, 2023 | Mike Vizard

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

Don’t Make Big Tech’s Mistakes: Build Leaner IT Teams Instead
March 27, 2023 | Olivier Maes
How to Supercharge Your Engineering Teams
March 27, 2023 | Sean Knapp
Five Great DevOps Job Opportunities
March 27, 2023 | Mike Vizard
The Power of Observability: Performance and Reliability
March 29, 2023 | Javier Antich
Cloud Management Issues Are Coming to a Head
March 29, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.