SBOMs 101: What You Need to Know

Recent security incidents have the industry buzzing about the lack of knowledge about code dependencies, attacks on the software supply chain, software bills of materials (SBOM), digital signatures, provenance, attestation and the like. The fact is, every time a new vulnerability appears a lot of time and effort is required to detect not just when, where and how the vulnerability occurred, but also to measure the real impact on the applications and services that are running in our environments.

What if we had a way to enumerate all the software components we used and produced that could be distributed and consumed easily? This is the problem SBOMs are trying to solve.

What is a Software Bill of Materials (SBOM)?

An SBOM is simply an artifact containing a comprehensive list of package dependencies, files, licenses and other assets that, together, make up a piece of software. Think of an ingredients list, but for software.

The NTIA defines an SBOM as a formal record that contains the details and supply chain relationships of various components used in building software. These components, including libraries and modules, can be open source or proprietary, free or paid and the data can be widely available or access-restricted.

This concept is nothing new, and bills of materials (BOMs) are a common, existing part of industrial processes. They are very similar to a list of ingredients, although a BOM usually includes the concept of hierarchy, as well. Every component is broken down into a list of subcomponents, each of which also includes their own BOM.

In SBOMs, the pieces are commonly abstract libraries, modules, binaries, compilers, files, etc., and they usually include licensing information (Apache 2.0, GNU, BSD, etc.) and additional metadata.

SBOM History

SBOMs might seem new, but the open source community realized the need for such a thing and created SBOMs more than 10 years ago. The Software Package Data Exchange (SPDX) standard was created in 2010 to communicate SBOM information including components, licenses, copyrights and security references.

In recent years, the issue has come to the fore due to the increase of attacks related to the supply chain. Here’s a short timeline.

October 2015 – SWID Tags standard, from NIST, published as ISO/IEC 19770-2:2015.
May 2017 – Initial drafts of CycloneDX, an OWASP SBOM standard.
December 2020 – The ISO International Standard for open source license compliance (ISO/IEC 5230:2020 – Information technology — OpenChain Specification) is published, requiring a process for managing a bill of materials for supplied software.
2020 – 2021 – NTIAs publishes latest work as part of the ongoing Software Component Transparency effort around Software Bill of Materials (SBOM).
February 2021 – Executive Order 14017 on America’s Supply Chain.
May 2021 – Executive Order 14028 on Improving the Nation’s Cybersecurity.
July 2021 – NIST releases the Recommended Minimum Standards for Vendor or Developer Verification (Testing) of Software Under Executive Order (EO) 14028.
August 2021 – SPDX published as ISO/IEC 5962:2021 standard.
September 2021 – First draft of SLSA (Supply-Chain Levels for Software Artifacts) framework.
February 2022 – DoD plan on Securing Defense-Critical Supply Chains which includes Software Supply Chain.

And this is just the beginning. 2022 is becoming a turning point in how the industry is approaching SBOM and software supply chain security challenges.

Where Does the SBOM Apply?

An SBOM applies to any software component, either external or internal, open source or proprietary files, packages, modules or shared libraries used in the construction of software products. This includes firmware and embedded software, too. Hardware might take part in the distribution or execution of software (network equipment, cryptographic devices, chips, etc.) but is not considered part of an SBOM, although standards like CycloneDX support devices as a type of component.

In an ideal world, every software company would attach an SBOM to each deliverable, and everyone would have full visibility into the components used in the software. That would enable knowledge of exactly which vulnerabilities are impacting that software.
But we are not in that ideal world.

Typical Assets With SBOMs

● Development dependencies. Every time a developer includes a third-party dependency (either open source or an internal component), it is often adding transitive dependencies – the modules or packages that the dependency itself is using. So, a detailed SBOM would enable visibility into those transitive dependencies, as well.
● Software applications or packages. When an application is distributed, the SBOM would help the consumer quickly identify the application version, helper tools that are included in the package and all the pieces that were involved in the build process. This makes it much easier to identify vulnerabilities or troubleshoot issues that might be caused by buggy software dependencies.
● Container images. Container images are basically a filesystem composed of a base image distribution plus a set of additional packages and components added during the build process.
● Hosts. Hosts include, for example, virtual machine appliance images, an AWS AMI and the like. The SBOM would include the base operating system type, vendor, version and a comprehensive list of each package installed in the host, either from the base operating system (e.g., the Linux distribution) or manually deployed from external sources.
● Hardware devices. Examples include a firewall, an IoT device or a mobile phone that is running software.

SBOMs should capture any and all multi-level dependencies. So, for example, if package libfoobar-1.5.3-r3-u8 is part of the SBOM, it should also include each package name, version, license, etc. used to assemble libfoobar-1.5.3-r3-u8, and the components for each of these, resulting in a multi-level tree where each node is decomposed into its dependencies.

It is also important to point out that every time one of these assets changes—every release of a product or even each build—a new SBOM should be created to match the changes for that version.

Why You Need an SBOM

You need an SBOM for the same reason you need a list of ingredients for food. You can check for the presence of allergens, the presence of animal-based substances (for vegans), chemical preservatives, etc. Certainly, you can eat food without checking the list of ingredients, but you are assuming some risks. The same applies to software: It might be fine to use a dependency for a quick test in a sandbox environment, but you definitely want to know what’s inside when deploying critical services to production or delivering software in environments with strong compliance regulations.

Without an SBOM, you don’t have that visibility. A piece of software becomes an impenetrable black box with respect to the packages and libraries that are used to assemble it.

The availability of an SBOM for third-party dependencies makes it easier to build an SBOM for your software by simply adding your own ‘ingredients’ to the existing list of dependencies. Of course, your software can also be the input or dependency of a more complex product, and consumers might demand the presence of an SBOM as part of their supplier’s requirements.

Knowing which licenses are necessary for the different software pieces is also very important. Otherwise, distributing software using third-party libraries under multiple license types might break the usage terms or force you to make source code public, which can be inconvenient and even get your company into trouble.

Finally, the SBOM is a key element of the vulnerability scanning process. Provided that you have an accurate SBOM and reliable and updated vulnerability feeds from different vendors and sources, it is pretty straightforward to find which vulnerabilities are present in the software. Without an SBOM, the vulnerability scanning software needs to compute and guess; making its own inaccurate version of an SBOM, which might be quite tricky or even impossible for some components.

A good SBOM should allow organizations to answer questions like, “Am I vulnerable to the CVE-2022-22965 (Spring4Shell) vulnerability?” Exploiting this vulnerability requires a set of conditions to happen simultaneously in the host or container running the exploitable Java package:

● Using SpringCore versions 5.3.0 to 5.3.17, 5.2.0 to 5.2.19 or older, unsupported versions
● Using JDK 9 or higher
● Running Apache Tomcat as the Servlet container
● Having the library packaged as WAR
● Using the spring-webmvc or spring-webflux dependency

Most of these conditions can be checked for in the contents of a comprehensive SBOM, making it easier to assess the risk in your environments by focusing on fixing the exploitable applications first.

Creating an SBOM

Currently, generating SBOMs is complex because there are multiple competing standards, distributions, etc. that make adoption slower than desired.

Many tools exist that can help you create an SBOM for a piece of software. Before you even consider producing an SBOM, it is critical that the build process is completely automated (Level 1 in the SLSA framework) and SBOM creation is integrated as part of the build pipeline.

Next, these are some example executions and outputs of open source tools and the corresponding SPDX or CycloneDX (truncated) SBOM, which are two of the most common standards.

Syft

Syft can generate an SBOM in SPDX or CycloneDX format from a filesystem or container image, and it is embedded in Docker by default using the docker sbom command.

$ syft neo4j:latest
✔ Parsed image
✔ Cataloged packages [376 packages]NAME VERSION TYPE
CodePointIM 11.0.15 java-archive
FastInfoset 1.2.16 java-archive
…
util-linux 2.36.1-8+deb11u1 deb
wget 1.21-1+deb11u1 deb
zlib1g 1:1.2.11.dfsg-2+deb11u1 deb
zstd-jni 1.5.0-4 java-archive
zstd-proxy 4.4.8 java-archive

When using the -o flat to set the output to spdx-json format, it will produce a document like:

$ syft -o spdx-json neo4j:latest
✔ Parsed image
✔ Cataloged packages [376 packages]{
“SPDXID”: “SPDXRef-DOCUMENT”,
“name”: “neo4j-latest”,
“spdxVersion”: “SPDX-2.2”,
“creationInfo”: {
“created”: “2022-06-23T10:09:26.751733Z”,
“creators”: [
“Organization: Anchore, Inc”,
“Tool: syft-0.48.1”
],
“licenseListVersion”: “3.17”
},
…
“packages”: [
{
“SPDXID”: “SPDXRef-fd9f083cc189cf0c”,
“name”: “CodePointIM”,
“licenseConcluded”: “NONE”,
“checksums”: [
{
“algorithm”: “SHA1”,
“checksumValue”: “50a6f2c46702b14cb129aac653d9abfcdc324363”
}
],
“downloadLocation”: “NOASSERTION”,
“externalRefs”: [
{
“referenceCategory”: “SECURITY”,
“referenceLocator”: “cpe:2.3:a:oracle-corporation:CodePointIM:11.0.15:*:*:*:*:*:*:*”,
“referenceType”: “cpe23Type”
},
…
{
“referenceCategory”: “PACKAGE_MANAGER”,
“referenceLocator”: “pkg:maven/CodePointIM/CodePointIM@11.0.15”,
“referenceType”: “purl”
}
],
“filesAnalyzed”: true,
“licenseDeclared”: “NONE”,
“sourceInfo”: “acquired package info from installed java archive: /usr/local/openjdk-11/demo/jfc/CodePointIM/CodePointIM.jar”,
“versionInfo”: “11.0.15”
},
{
“SPDXID”: “SPDXRef-80979ce84b1617b2”,
“name”: “FastInfoset”,
“licenseConcluded”: “NONE”,
…
},
…
],
“files”: [
{
“SPDXID”: “SPDXRef-9e950849d3fbc974”,
“comment”: “layerID: sha256:ad6562704f3759fb50f0d3de5f80a38f65a85e709b77fd24491253990f30b6be”,
“licenseConcluded”: “NOASSERTION”,
“fileName”: “/bin/bash”
},
{
“SPDXID”: “SPDXRef-d1fd1bc48eedeaba”,
“comment”: “layerID: sha256:ad6562704f3759fb50f0d3de5f80a38f65a85e709b77fd24491253990f30b6be”,
“licenseConcluded”: “NOASSERTION”,
“fileName”: “/bin/cat”
},
…
],
“relationships”: [
{
“spdxElementId”: “SPDXRef-a124711c55c5b5ec”,
“relationshipType”: “CONTAINS”,
“relatedSpdxElement”: “SPDXRef-9f73084aac22b0b3”
},
{
“spdxElementId”: “SPDXRef-a124711c55c5b5ec”,
“relationshipType”: “CONTAINS”,
“relatedSpdxElement”: “SPDXRef-23989aa2a193ea3d”
},
…
]}

This not only includes the packages but also the files in the image, relationships between the elements, licensing information and more.

cyclonedx/bom

The Node.js package cyclonedx/bom allows you to generate an SBOM from a Node project in CycloneDX format. An example output when generating the SBOM from github.com/fastify/fastify looks like:

$ cyclonedx-bom
$ cat bom.xml
<?xml version=”1.0″ encoding=”utf-8″?>
<bom xmlns=”http://cyclonedx.org/schema/bom/1.3″ serialNumber=”urn:uuid:be53de33-6897-49ca-855d-926383866c21″ version=”1″>
<metadata>
<timestamp>2022-06-23T10:03:17.018Z</timestamp>
<tools>
<tool>
<vendor>CycloneDX</vendor>
<name>Node.js module</name>
<version>3.10.1</version>
</tool>
</tools>
<component type=”library” bom-ref=”pkg:npm/fastify@4.1.0″>
<author>Matteo Collina</author>
<name>fastify</name>
<version>4.1.0</version>
<description>
<![CDATA[Fast and low overhead web framework, for Node.js]]>
</description>
…
</component>
</metadata>
<components>
<component type=”library” bom-ref=”pkg:npm/%40fastify/ajv-compiler@3.1.0″>
<author>Manuel Spigolon</author>
<group>@fastify</group>
<name>ajv-compiler</name>
<version>3.1.0</version>
<description>
<![CDATA[Build and manage the AJV instances for the fastify framework]]>
</description>
<licenses>
<license>
<id>MIT</id>
</license>
</licenses>
<purl>pkg:npm/%40fastify/ajv-compiler@3.1.0</purl>
<externalReferences>
<reference type=”website”>
<url>https://github.com/fastify/ajv-compiler#readme</url>
</reference>
<reference type=”issue-tracker”>
<url>https://github.com/fastify/ajv-compiler/issues</url>
</reference>
<reference type=”vcs”>
<url>git+https://github.com/fastify/ajv-compiler.git</url>
</reference>
</externalReferences>
</component>
…
</components>
<dependencies>
<dependency ref=”pkg:npm/fast-deep-equal@3.1.3″/>
<dependency ref=”pkg:npm/json-schema-traverse@1.0.0″/>
<dependency ref=”pkg:npm/require-from-string@2.0.2″/>
<dependency ref=”pkg:npm/punycode@2.1.1″/>
<dependency ref=”pkg:npm/uri-js@4.4.1″>
<dependency ref=”pkg:npm/punycode@2.1.1″/>
</dependency>
<dependency ref=”pkg:npm/ajv@8.11.0″>
<dependency ref=”pkg:npm/fast-deep-equal@3.1.3″/>
<dependency ref=”pkg:npm/json-schema-traverse@1.0.0″/>
<dependency ref=”pkg:npm/require-from-string@2.0.2″/>
<dependency ref=”pkg:npm/uri-js@4.4.1″/>
</dependency>
…
</dependencies>
</bom>

snyk2spdx

Snyk’s snyk2spdx tool leverages Snyk’s open source API to create an SBOM from your code repositories. Unfortunately, at the time of writing, this repository is outdated and unmaintained.

Other Tools

There are also a number of online tools that allow importing different formats or manually adding components to the SBOM definition and then downloading it. The NTIA also published the “How-To Guide for SBOM Generation“, a collection of simple instructions and guidance on how to generate an SBOM. It is interesting that the guide includes the concept of “completeness assertion” for cases where the dependencies of some components are missing.

Vendor-Provided or Guessed-BOM?

Ideally, the vendor of a product should tell us every component and provide it in a digitally signed document to prevent tampering or modifications. But we are still far from this holy grail, and there aren’t many vendors producing and providing this information. It’s a complex process involving multiple tools and pieces, and there are multiple standards for SBOM distribution.
In an ideal world, every vendor would provide a 100% accurate, comprehensive, digitally-signed bill of materials in a common standard. But in the real world, we usually need scanning tools that can produce a ‘guessed’ bill of materials. This is harder, as many components are opaque and it is difficult to discover the dependencies or libraries used during the build.

Still, scanning is necessary, as the SBOM from the vendor might be wrong. The vendor’s build process might be compromised, so some components might be intentionally omitted from their SBOM, which brings us to the following question: Can the SBOM be wrong or inaccurate?

Yes. The quality of an SBOM depends on the quality and automation of the process that builds the SBOM.

It is easy to produce the root level (like the version and details of the software you are directly building) and the first level of dependencies (packages and third-party libraries). It becomes more difficult to do so for transitive dependencies and even harder as you navigate deeper in the tree. Many components might not provide their own SBOM and detecting dependencies can be complex or just plain impossible, like in statically linked binaries with stripped information.

Even with a perfect toolchain and perfect SBOM information during the build phase, an attacker could tamper with the contents of the SBOM (i.e., modify the companion file or artifact at rest) to hide the fact that it contains vulnerable or malicious components. A consumer would then retrieve the modified version of the SBOM and miss these dangerous components.
A common, recommended practice is adding a digital signature to the SBOM artifact to make sure the consumer can verify its authenticity and integrity.

Even worse, it is possible that an attacker could compromise the build pipeline itself and modify the process of creating the SBOM, which would result in a digitally signed but altered list of components.

From a scanning tool perspective, a software component is usually a black box, or the amount of information that can be obtained from analysis might be quite limited, as most of it (like pom.xml or go.mod files) is available during a build but removed in the final deliverable.

An appropriate simile would be analyzing food in a laboratory versus having the list of ingredients from the producer. Analysis or scanning would produce quantitative data (what is observed in the box) versus the provided SBOM which can contain qualitative data and that can be lost or invisible to an analysis.

To minimize the risks of poor quality SBOMs or attacks, it is recommended that organizations use scanning solutions even in the presence of vendor-provided SBOMs.

SBOMs and Vulnerabilities

A vulnerability is a weakness or flaw that an attacker can exploit to bypass security boundaries, get access to a system and more. They are a typical way to attack or compromise the software supply chain.

To find vulnerabilities in a piece of software (or in a running host, a container image, etc.), you need something that matches “known” vulnerabilities with the set of components in your software. This is called vulnerability scanning. And it’s where the SBOM comes into play, as it contains a comprehensive list of packages and versions composing your software.

Then, another big question arises: Where do the known vulnerabilities come from? (Note that vulnerabilities need to be known in advance, you cannot detect an unknown flaw.) Researchers and hackers discover them and they end up in vulnerability databases that can be consumed by humans or computers. There are two main sources for vulnerabilities: Vendors and independent providers.

Vendors can provide feeds for vulnerabilities in their products, like major Linux distributions or package repositories like Go, npm, etc. Vendors have good context around how the vulnerability impacts the product. However, they might also be biased in regard to severity.

Independent providers like NIST, Mitre, the Open Source Vulnerabilities Database and commercial offerings like Snyk and VulnDB collect, analyze and provide information for vulnerabilities. The drawback is the score is objective, without a specific context of how the vulnerability might apply to different products. In some cases, vulnerabilities might not even impact a vendor-specific product version because it is forked or the patch is backported.

Consuming vulnerability feeds can be challenging because different formats and standards exist for vulnerability information exchange, like:

● Red Hat OVAL – An XML format used for Red Hat Enterprise Linux, Openshift, and other Red Hat products and also available for Ubuntu.
● Different JSON feeds like the Debian Security Tracker.
● APIs like OSV allow querying for specific open source package versions.
● Security advisories meant for humans but not for automated processing, like Gentoo security.
● NVD CVE JSON v5.0, an attempt to create a standard CVE format.
● Common Security Advisory Framework (CSAF) – Standardized automated disclosure of cybersecurity vulnerability issues.
● Vulnerability Exploitability eXchange (VEX), which has been implemented as a profile of CSAF.

VEX is interesting, as it allows vendors to provide a kind of ‘negative’ security advisory that says, for example, that a certain vulnerability does not apply to a component because the submodule in the package is not even used in the product.

Another source that can help prioritize vulnerabilities is the CISA’s Known Exploited Vulnerabilities catalog (KEV), an updated list of vulnerabilities with assigned CVE ID, reliable evidence of being actively exploited in the wild and clear remediation (such as a vendor update).

The following diagram describes the full vulnerability management flow:

A typical flow is comprised of a scanning tool that is capable of creating an SBOM by analyzing a container image, host or workload, or directly by consuming a pre-computed SBOM (or both!). The scanning tool then matches known vulnerabilities from different sources (usually the vendor-provided sources for the corresponding Linux distribution, plus generic sources like NVD to report the list of vulnerabilities impacting the software.

You can see an example of matching the Kubernetes SBOM (which is publicly available for each version) with known vulnerabilities from OSV using the spdx-to-osv tool.

The list of detected vulnerabilities can be curated and prioritized with additional information, like VEX (not exploitable vulnerabilities) from the vendor, KEV list or risk spotlight information from Sysdig, which can detect the packages effectively loaded during the execution of the workload. This filters out the packages that are inside a container image but never executed, so they are not exploitable.

Pitfalls

Even though there is a lot of buzz around supply chain security and an ever-growing set of tools and products, many of these tools generate SBOMs by analyzing the components and guessing the dependencies.

The optimal approach to generating an SBOM on every upstream component still requires tailor-made solutions for most cases. This translates into incomplete or inaccurate SBOMs, not to mention the different existing formats (CycloneDX, SPDX, SWID) and lack of a standardized distribution mechanism, making wider consumption of SBOMs pretty difficult.

Another problem to consider is that the limitations in the SBOM propagate to vulnerability scanners. For example, a missing package can result in a false negative (an existing vulnerability not being reported) and applying a patch on a custom package version can result in a false positive. In general, any package customization which doesn’t result in a version reflected in vulnerability databases might cause a false positive/false negative issue. And there is no single provider for vulnerability information, nor a single exchange standard. Ideally, every vendor would provide their own source of security advisories, and VEX to allow flawless identification of the existing weaknesses.

Conclusion

An SBOM is a key element in securing the software supply chain and fundamental for vulnerability matching and management. It is becoming more important as software consumers and governments are raising the collective bar on security requirements and software quality for their providers.

At the time of writing, there are still different competing standards, a plethora of tools and a lot of uncertainty; most of the actors are still struggling to get there. But the general consensus is that we need to secure the supply chain, converge on common standards, and make the SBOM an essential part of the build process.

An interesting initiative to follow up to start securing the supply chain is the SLSA framework, which introduces different levels of maturity in the software supply chain, so you can start from nothing and progressively implement different mechanisms to be as resilient as possible, at any link in the chain.