The CycloneDx SBOM Format
[00:00:00] DJ Schleen:
I’m DJ Schleen and welcome to daBOM.
I’m on a journey to demystify Software Bill of Materials and on this podcast I’ll be investigating technical, regulatory, and practitioner stories in and around the SBOM and -BOM movement.
Along the way you’ll meet the people and teams responsible for creating and maintaining the various Software Bill of Materials formats, and we’ll also dig deep into all types of Bill of Materials including SBOMs, SaSSBoms, IBOMs and any other type of -BOM that you may have heard about.
If you’re interested in software security, the software supply chain, and want to know what’s in your software, you’re in the right place.
On today’s episode, I’ll be talking to Steve Springett from the CycloneDX project about the CycloneDX format, SBOM specification.
Welcome to the podcast. I’m here with Steve Springett from CycloneDX. Steve, how are you doing and what’s going on?
[00:00:57] Steven Springett:
I’m doing great DJ. Lots of stuff going on in OWASP and CycloneDX land, that’s for sure.
[00:01:03] DJ Schleen:
So tell me a bit about CycloneDX and the CycloneDX format.
[00:01:07] Steven Springett:
CycloneDX is a full stack, Bill of Material standard, designed primarily for cybersecurity use cases, but it handles license and intellectual property’s cases and all kinds of other things as well. We handle software for SBOM but we also handle things like hardware devices and services and even vulnerabilities, if you want to inventory those as well.
It really is full stack and then we’re working on enhancing that stack to support platforms and ML models and all kinds of other stuff as well. Because the more the stack that we can represent as inventory, the more transparent that we can become.
[00:01:46] DJ Schleen:
When it comes to software, hardware, in general, what’s the problem that CycloneDX is trying to solve around inventory?
[00:01:53] Steven Springett:
Lots of organizations just simply do not know what they have. This is true for pretty much any organization that doesn’t have software composition analysis tools today, which according to some specs, is the fraction of the companies that actually produce software.
Even if you do have SCA though, are you really tracking the full stack? Most SCA vendors really specialize in open source libraries. And yes, the open source ecosystem is large, but it’s not the center of the universe. There’s lots of commercial stuff out there. There’s things like images and fonts that I can’t analyze with SCA, but I still need to be able to inventory them.
And then of course there’s this thing called IoT. I heard it’s really popular. we need to be able to inventory these devices and the firmware. And all the services that your IoT toaster, sitting right behind you DJ, is calling out to.
[00:02:47] DJ Schleen:
Let’s travel back a little bit to 2017 when the original prototype for CycloneDX came out. What were the drivers to get that in place?
[00:02:57] Steven Springett:
Yeah, even before that, the actual model really was being… I don’t want to say, solidified… but it was starting to come together between the years of 2013 and 2017. The original application for this was OWASP dependency track, which was a full stack thing that could analyze your inventory of stuff… whether it was software, hardware, firmware, whatever. And identify vulnerabilities in those things. Even before 2017, we had already proven that this works. We knew from a model perspective what we wanted to do and why.
When 2017 came around, we were looking around to see… “Hey, what’s available?” Because recreating the wheel isn’t fun – insert your favorite XKCD comic strip right here. We did look around.We saw SWID and we were like, oh, this is pretty cool for software identity. We saw SPDX and we’re like, oh, we can use these licenses and things that, that they’ve done. Their file format was interesting, but it really didn’t meet our requirements.
We kept on looking around and then looking around and we just really didn’t see anything. We were using OWASP dependency check at the time. It’s definitely not a Bill of Materials format, but it could be used as one.
We were using that to start out with because you had a chicken and egg problem in 2017. There was really no Bill of Materials formats. There was no real reason to generate them at the time. Something had to exist to support something else. In 2017 we looked around. We really didn’t see anything, but we recognized that SPDX was doing a lot of great stuff with license and intellectual property and licensed IDs and licensed expressions. We just didn’t really see anything that kind of met our requirements.
Some of the minimum requirements back in 2017 was this new thing called package url, which is important for anything hosted in a package repository. If you cannot represent package url, you cannot do basic vulnerability management for open source software or at least packages anyway.
So that was… the hard requirement for us. And then of course, CPEs for anything that’s non package right? …whether it’s hardware or commercial software that sort of thing. But in 2017, CycloneDX was very simple, but it did that one thing and it did that one thing very well.
Since then, we’ve had, yearly updates to the spec. Being in the security industry, there’s never a shortage of work to do. Teams need to be able to tackle a particular challenge of the day or the week and move on to the next thing because they keep on coming at you.
It was important for us to be able to create a standardization process that provided immediate value to the community, in a way that we could very quickly iterate on and provide even more value in a future date that wasn’t 2, 3, 4 years down the road, like most standards actually take.
We’ve been releasing updated versions of our spec, pretty much in Q1 every single year since 2017.
[00:05:58] DJ Schleen: Are you going to be doing a new one this year?
[00:05:59] Steven Springett: Yes, we are. It’s going to be epic too. I’m really looking forward to this one.
[00:06:04] DJ Schleen: Nice. I can’t wait to see it.
I’ve used CycloneDX since version 1.0. Every version that came out after that added new functionality and new features and new problems that it was trying to solve. Adding VEX into the CycloneDX format… that was massive.
[00:06:20] Steven Springett:
That was 3 years in the making. We actually had that originally, not the full analysis part, but we had vulnerability support since 2019. Actually, it was actually originally contributed by Sonotype. So we have 3 years of people using vulnerability data with Cyclone and providing feedback to us and that sort of thing.
We had continuous feedback from the dependency track community, which was great. So by the time 1.4 was being developed, we knew exactly what we wanted to do.
The VEX work that NTIA started and that CISA started, that’s multiple years in the making. It’s still not done yet, but we already knew what we wanted to do and why. Our VEX support basically is a super set of what CISA has. So you can think of the Cyclone VEX support as VEX plus .
[00:07:11] DJ Schleen:
That’s pretty awesome.
Circling back to the CycloneDX, coming from dependency check, when you’re solving the problem behind that, was that a storage issue where you’re trying to say, okay, how do we extend this into this Bill of Material space from the file formats that we’re trying to save inside of dependency track?
[00:07:54] Steven Springett:
It was hard. For those that are watching or listening that don’t know what dependency check is, it was actually one of the very first security SCA tools created by Jeremy Long. He was a researcher at Wells Fargo at the time. But released in, I think in 2012, was his original announcement to Black Hat for the tool. But it was one of the very first security focused SCA tools.
The premise of it is that it would scan files on the file system, and extract evidence from those files. And that evidence would have varying degrees of confidence. What we did with dependency track is that we realized that SCA tools are not necessarily a Bill of Materials. But if you have something with a high enough confidence, you can actually say that, oh, okay, I think it’s this, I think it’s jQuery version X. Because you have enough high level confidence to say, to make that claim.
But it was a really interesting transition going from SCA type things, which use evidence of what it is that they’re trying to identify to the assertion model, which is what a Bill of Material does. With a Bill of Material, you are asserting that I have this thing which is just the opposite of SCA. So that was an interesting transition, for both dependency track as well as just the overall security community back in 2017 and 2018. Because they really weren’t… they didn’t really know what to make of this thing called a Bill of Material back then.
Now, nowadays, everybody or most people understand it. But back then this was a very new concept in the security community that we really had to overcome.
[00:09:10] DJ Schleen:
This is actually pretty cool because it looks like CycloneDX was born with security in mind. From the genesis of this and that’s probably the association with OWASP as well, right?
[00:09:23] Steven Springett:
The genesis was absolutely within the OWASP community. It was spearheaded by myself and a few others in the OWASP community but we made the decision to develop the spec outside of OWASP initially because we didn’t want to be seen as an OWASP standard for an OWASP tool. We wanted everybody to adopt this.
Once lots of organizations and vendors started doing that, then it made sense to roll this back into the OWASP foundation and that’s exactly what happened.
So today, CycloneDX is a flagship OWASP project, which means it’s the highest level of maturity of an OWASP project. And the flagship status indicates that it has strategic importance to the OWASP foundation, which is pretty cool.
[00:10:11] DJ Schleen:
That’s an amazing accomplishment.
What was the first format of CycloneDX? Was it JSON format, was it XML format? Why did you decide to go with those two, versus, I don’t know, could be a CSV… there’s different crazy formats out there. What prompted you to go with those kind of structures?
[00:10:30] Steven Springett:
We obviously did not want to create our own file format because then you have to create your own parsers and whenever you create your own parsers you’re definitely going to have security vulnerabilities. That’s just the way it goes. But there’s a lot of parsers for XML, there’s a lot of parsers for JSON. We made the decision to first support XML, because it’s very easy to harden XML and simultaneously it’s also easy to extend XML.
Version 1.0 of CycloneDX was bare bones, it didn’t do much. 1.1 had a lot of improvements to that, especially around pedigree and providence and that sort of thing. At that point we really made it extensible. That’s where some of the vulnerability extensions came in and we had a bunch of other extensions as well.
I think that was a good choice to standardize on XML, especially during those early days, because JSON by definition is not extensible. There are ways to extend JSON. You can use various properties and that sort of thing. However, when you do that, you cannot use a hardened JSON schema. You have to relax your JSON schema to the point where pretty much anything can go. That right there is going to eliminate a large percentage of the population who actually want your Bill of Materials in the first place, namely some high assurance use cases in the US federal government.
That XML first approach really served us well until we got to the point to where we are today, which is the format supports a lot of stuff today. There’s less and less need to make it extensible in a way that would require an external schema. So JSON came later. But we have this other way to “extend” and I’m going to put that in air quotes for the folks that are listening. We have this other way to extend CycloneDX in a format agnostic way, using CycloneDX properties, which are basically name value pairs. And we have a formal taxonomy of those as well. you can register your own name space, and that sort of thing. It works really well.
[00:12:36] DJ Schleen:
When you talk about the Federal Government, can’t forget about Biden’s Executive order, right? 14028…What did that do when that came out for CycloneDX, when Biden signed the executive order?
[00:12:47] Steven Springett:
[00:12:48] DJ Schleen:
That could be a whole episode on its own, right?
[00:12:51] Steven Springett:
Yeah, exactly. Before then it was Bill of Materials were this little side project that security nerds were working on. And since the EO, it’s been this more mainstream thing that people are talking about, not only just in the security space, but even in boardrooms. It’s been interesting. The rate of adoption has gone through the roof, since the executive order was signed.
Right now OWASP dependency track which is10 years old at this point, it is analyzing upwards of, let me get my numbers right, upwards of 200 million components represented in CycloneDX every single month. That’s a massive spike of adoption.
But we’ve also had to start worrying about other things. We are one of essentially two specs, the other one being SPDX. That is both a privilege and an honor but we have to be very cautious as maintainers. Because there’s always the possibility that you might get a malicious maintainer who has this long tail mentality, they’re not going to push something right away that might be malicious. But over the course of time you can chain some of their code together to make something malicious. We’re having to keep a watchful eye on things like that, which is an interesting position to be in.
[00:14:06] DJ Schleen:
With dependency track, we talk about SCA and you had mentioned before that CycloneDX, it’s more than SCA. One of the questions I always get is, “How does Software Bill Materials differ from SCA? I get this information from vendors. I have an SCA tool running. Why do I need a Software Bill of Materials?”
[00:14:29] Steven Springett:
That’s a very common question. There’s multiple answers to that. A lot of SCA tools can actually generate a Software Bill of Material, which is great. All your major SCA vendors can do that. Many of your open source SCA tools can do that. That gives you a baseline to where to start.
SCA tools primarily work on open source software. Most, not all. But most of them don’t really have data scientists working on all the commercial things that exist because you have to have access to those commercial things in the first place. Open source is readily available, so most SCA vendors have pretty good databases on all the open source projects versions when they were published, do they have vulnerabilities, that sort of thing… that’s easy.
It’s the commercial aspects that you get a little bit more complicated. SCA also doesn’t handle non-software, digital assets, things like fonts and images and some of these other things. Most SCA tools don’t necessarily support today.
SCA is really a method in which a Bill of Material can be created. It typically uses evidence, either binary or manifest analysis, or a combination of both to produce a list of ingredients. Likewise, if you start off with producing your Bill of Material in your build pipeline, you have a little bit more control over what that Bill of Material can be.
Say for example, I’m using spring security, version five dot whatever. And, I for whatever reason, tight coupling, whatever… I can’t upgrade that component. So instead, I’m a responsible vendor. I backport the security fixes to the version that I’m using and the SCA tools will always identify that this thing has vulnerabilities in it. In reality it doesn’t, because I backported those.
With a Bill of Material you can actually describe, “This is what I have, this is what I derived from. And these are my changes that I’ve made and why.” You can have a trust but verify model with CycloneDX, which I think is really important.
[00:16:31] DJ Schleen:
What do I do with these SBOMs once I get them? So government organizations or government entities… I’m asking for a Software Bill of Materials. What do I do with these? Where do I store it? Do I store it in a file share somewhere?
There’s this delivery of Software Bill Materials, and then there’s this ingestion. CycloneDX is the format that enables both of those techniques to happen. What have you seen for people actually ingesting these and doing something with them? What can you do with a CycloneDX Bills of Materials once you receive one?
[00:17:03] Steven Springett:
There’s certainly not a shortage of ways to generate Bills of Materials. There’s lots and lots of tools, different methods in which the Bills of Materials can be created. There is an abundance of that. What there isn’t is an abundance of tools to actually consume them. Yes, tools like the dependency track have been around for a while, but that’s one tool and it’s not for everybody. Some folks might want really lightweight tools to put in their CICD pipeline. Other companies might want more enterprise class. So because they have millions of assets in their environment, they need to track. Dependency track is neither of those things. There’s definitely a shortage of tools on the consumption side.
Long-term storage is also an issue, but that kind of goes into sharing. And unfortunately that’s one of the things that we really don’t have any good solutions for today. There’s some proprietary methods. Some vendors that actually have some pretty decent products for SBOM and other type of supply chain sharing. But there’s really no standard protocol to do this.
Earlier in 2022, the OWASP CycloneDX project, we unveiled something called Project Koala, which is the BOM Exchange API. Think point to point, but it’s basically a protocol format that two parties can exchange a Bill of Material regardless of Bill of Material format. So it supports SPDX, it supports Cyclone, it supports some other future format. But it’s really just a way to get people to agree on this is how we’re actually going to trade these things.
It supports authentication and authorization. It will eventually support things like realtime VEX, which I’m sure we’ll talk about in a future episode. But, it does not prescribe any kind of storage mechanism. So if you want to make these things dynamic or if you want to store them on the blockchain, whatever you wanna do, with it that’s fine. We’re just talking about sharing between two or more parties.
So we’re working on that. We’ve got some feedback already. The intent with project Koala is that we want to submit that to ITF later this year. That’s really the goal.
We’re also approaching vendors in the repository space that make commercial repository servers because we want them to support this as well. But that not only handles SBOM use cases, but it handles VDR and VEX as well. Because that’s just as important to be able to operationalize SBOM as the SBOM itself.
There’s a lot of opportunities for both commercial and open source projects to step up on the consumption side and start creating some tools. Look at the space. Look, see what’s out there. Figure out where the gaps are and start building stuff because there’s not a shortage of places that need help.
[00:19:45] DJ Schleen:
You’re giving away your crown jewels with the Software Bill Materials to an extent. Part of the security and the vulnerabilities that are included with that, you can scan a Software Bill Materials and find that. But when you start looking at VEX and having that vulnerability information travel with that Bill of Materials, how do you handle that from a consumer privacy perspective or an organizational privacy perspective?
[00:20:06] Steven Springett:
So… excellent question.
Not a lot of tools are available yet to do that. There’s yet another project that I’m involved with called the OWASP Software Component Verification Standard or SCVS. SCVS is basically a way for an organization to measure and improve their software supply chain assurance. SCVS is referenced in its entirety in, NIST SSDF, 1.1. But the SCVS Project is working on something called a BOM Maturity Model.
This is a complete taxonomy of everything, pretty much that you could ever put in a Bill of Material… format agnostic, and descriptions of what all these various things are. It also includes this concept of a profile. A profile is basically, “Hey, I want to use these parts of the taxonomy. These are the level of difficulty of those things to achieve, and these are the importance to me… for my consumption use case.”
What we’re envisioning are these new breeds of tools called SBOM profilers to exist in the near future, once the maturity model has been done. We’re expecting the maturity model to be available sometime in early Q2, would be my guess, even if it’s a pre-release at that point.
Once that’s available and people start building tools, then you’ll actually be able to reference a formal taxonomy. If you wanted to only give parts of your Bill of Material to say… these organizations and other parts to other organizations, you’ll be able to use profiles from the formal taxonomy to dynamically generate those if you so choose.
[00:21:50] DJ Schleen:
That’s great because then you’re opting into the things that you share.
[00:21:53] Steven Springett:
If I’m a legal team or if I’m an AppSec team or an infrastructure team, I can create my own profiles for the different types of analysis that I want to perform.
Like we said earlier, there’s not a shortage of tools that generate Bills of Materials. The problem is on the consumption side. You might actually get a Bill of Material and perform a vulnerability analysis on it, and it might come up empty. You’re like, “Oh, great… Yay, no vulnerabilities!”, when in reality it may not have actually had the CPEs or perl for you to actually perform the type of analysis that you were trying to perform.
This is really important to be able to have a new breed of tools for SBOM profiling so that on the consumption side you can actually set organizational policy based on what types of analysis that you want to perform, whether it’s from security teams, legal teams, procurement M&A, that sort of thing.
You can also compare formats to themselves, whether it’s Cyclone in some other format, some future format even, and just say, “Hey, my organization, we care about these things. What format is best suited for our purpose?”
[00:22:58] DJ Schleen:
Oh, that’s cool.
And when you say perl as well, you had mentioned it before… perl is a first class citizen in CycloneDX, which I really love. It’s not embedded in an external reference or something that makes it hard to get to. It’s right up there at the top. That’s what I use for Bomber to actually go out and find vulnerability information. It’s the standard way of finding these things out.
It’s interesting when you mentioned Sonatype being involved in this earlier, because Sonotype was very heavy on perl usage for a lot of their APIs that they published back in the day.
[00:23:30] Steven Springett:
I think Sonatype started back in 2017. We caught on a little bit after Sonatype did, and we immediately saw the value of this. There were multiple projects in OWASP that adopted perl in 2018, because yeah, this is the future. This is how you’re going to identify all kinds of risks, not just known vulnerabilities for open source packages.
We’re hoping that the US Federal government eventually will adopt package URL as well. We’ve made some recommendations to them, so we’ll see if that advice is taken or not. And if so, how long it takes for them to do that.
[00:24:05] DJ Schleen:
Splitting a perl into ecosystem version component has all the secret sauce, I guess you could say in their locating what this thing is like a unique identifier.
Talking to folks that are out in the community that want to participate in the CycloneDX working groups and the projects around it, how do you get involved?
[00:24:27] Steven Springett:
Go to cyclonedx.org/participate. It has a list of where we’re at. It’s got our mailing list, which really isn’t used all that much. It’s mostly for our Federal Government folks who do not have access to Slack. But we have a pretty active Slack workspace. For historical purposes it is independent of the OWASP Slack workspace, so we have our own. But, there’s an invite on the participate page, so pretty active community there.
Always on GitHub, GitHub issues, GitHub discussions. We’re always active there as well. If anyone wants to contribute to CycloneDX specifications or any other tools, OWASP projects are free for anyone to participate with. You do not have to be an OWASP member. You don’t even have to be associated with OWASP in any way.
OWASP takes vendor neutrality kind of to the extreme, which is actually a really good thing. It means that… large vendors and, individuals with some really smart vested interest, they all have an equal seat at the table, which I think is great.
Because of that, we get a lot of diversity in the folks that want to contribute to the spec, which is great. But yeah, get involved with the GitHub issues. We’re currently working on version 1.5 of the spec right now. If you want to be part of our industry working group, send me an email at Steve.Springett@OWASP.org or hit me up on Slack. And if you want to be a maintainer, your pull requests and everything else are part of that. Contribute to whatever it is that interests you. And if you want to take something on more permanent, then let us know. We’re always looking for more maintainers, so there’s never a shortage of things to do.
[00:26:04] DJ Schleen:
There’s definitely opportunities for everybody from developers to the people who are thinking about strategic vision of where these things go.
[00:26:12] Steven Springett:
It’s not just code contributions. Code contributions are great, but quite honestly, we have a lot of those already. It’s documentation writers. It’s people doing white papers and case studies. Fortunately, we actually have one UX contributor. She’s phenomenal. She’s great and she’s really helping us. But front end developers, things that necessarily aren’t directly part of the CycloneDX spec implementations, but will help the project overall. We’re looking for all kinds of contributors.
If you’re interested in these types of things and especially for something like a Bill of Material that will make a difference to a lot of people’s lives, let us know.
[00:26:51] DJ Schleen:
Where do you see CycloneDX in five years?
[00:26:55] Steven Springett:
[00:26:57] DJ Schleen:
Version 1.10. But…
[00:27:00] Steven Springett:
Hopefully we don’t have to do a 2.0 and because of breaking changes. That would be the hope. Now, if I were to put on my utopian, everything is possible hat, I would tell you that CycloneDX wouldn’t be here because Bill of Materials would be obsolete because all your build tools, all your infrastructure would just be asynchronous messaging back and forth and everything just knows.
By the time you actually build your software, you already have all the information necessary for a Bill of Material and then some. But we’re human beings and human beings like to reinvent a wheel.
I don’t think we’re gonna get there. I don’t think that Utopian vision will ever really see the light of day.
Bill of Materials are definitely here to stay. We’re already seeing a lot of machine to machine usage, not necessarily with XML or JSON. We’re seeing a really big uptick in protobuf support, and I’m envisioning that support within 5 years is gonna pretty much skyrocket.
I also see MLBoMs taking more center stage within the next 5 years. As we all saw over the Christmas break with ChatGPT, those can be a lot of fun. But there is also a lot of ethical and security concerns with these as well that require transparency. So I think in 5 years time we’ll be talking about MLBoMs in a very similar way that we’re talking about SBOMs today.
[00:28:21] DJ Schleen:
That whole license node, inside of CycloneDX will be perfect for that, licensing of that piece of artwork or piece of literature, something like that. That’s going to be a challenge, but just looking at CycloneDX format right now, I think it’s definitely something that can encapsulate that kind of information in the future.
[00:28:38] Steven Springett:
That’s actually coming with 1.5 this year.
[00:28:42] DJ Schleen:
Okay. I… you know… You’ve been talking about 1.5.
Gotta ask you a question.
[00:28:47] Steven Springett:
We can do a whole episode on that…
[00:28:48] DJ Schleen:
Let’s keep that for a future episode because I want to get a little bit of research done and then come back at you and ask you a little bit about those features and where they came from and and where they’re going to go, and what kind of value they’re going to bring to the community.
But this has been great, Steven. Thank you so much for your time.
[00:29:06] Steven Springett:
Thank you for having me, DJ. I feel honored to be your very first guest and congratulations on the launch of the podcast. That’s amazing. I look forward to watching it and to talking to you some more.
[00:29:17] DJ Schleen:
This episode of daBOM was created by me, DJ Schleen, with help from sound engineer Pokie Huang and Executive Producer Mark Miller. This show is recorded in Golden, Colorado, and is part of Sourced Network Productions. We use Captivate.fm as our distribution platform and Descript for spoken text editing.
You can subscribe to daBom on your favorite podcast platform. We’re going to be releasing a new episode every Tuesday at 9:00 AM. I’ll see you next week as we continue to diffuse daBOM.
Resources From This Episode:
Steve educates teams on the strategy and specifics of developing secure software.
He practices security at every stage of the development lifecycle by leading sessions on threat modeling, secure architecture and design, static/dynamic/component analysis, offensive research, and defensive programming techniques.
Steve’s passionate about helping organizations identify and reduce risk from the use of third-party and open source components. He is an open source advocate and leads the OWASP Dependency-Track project, OWASP Software Component Verification Standard (SCVS), and is the Chair of the OWASP CycloneDX Core Working Group.