Behind the scenes with an engineer
As we continued the journey to unravel the world of Software Bill of Materials, I wanted to talk to a technologist who had been there from the start and could shed some light on the background of the movement.
The search for such a person led me to the South German State of Bavaria, where I found Max Huber.
Max has been a contributor to the SPDX project for upwards of 8 years, and he helped build some of the first tools to create and process the format.
SPDX or Software Package Data Exchange – from the Linux Foundation has become one of the leading formats for describing Software Bill of materials since its inception in 2010. The primary goal of the format is to simplify and standardize the exchange of information among software developers, suppliers, and users.
On today’s show we go behind the scenes with an engineer and learn a bit more about the technical side of SPDX, and gain insight into some of the upcoming features of SPDX 3.0.
Welcome back, to daBOM.
Welcome back to the podcast. I’m here with Max Hubber from TNG. He’s a contributor for the SPDX Project. How are you doing today, Max?
Oh, I’m fine. I’m perfect. I’m happy.
I have been talking to Kate Stewart from the Linux Foundation and SPDX, and she was mentioning that I should talk to you, especially around some of the contributions that you made into the tooling that we use on a day-to-day basis. A lot of us use this to interact with SPDX. I wanted to get your take on a lot of this stuff and ask you a little bit about your involvement in the project.
How did you get involved in all this contribution?
My first contribution was seven to eight years ago where I implemented the SPDX, input and output, so serialization and deserialization. That was the first use of SPDX 2.0 at that time.
2.0. Wow. So that was just before it went out to be an ISO standard, is that correct?
Yeah. I think even three to four years before the ISO transition and, rather, the state after the first big refactoring, the first big rework of the standard.
What attracted you to SPDX? Was it inventory management? Supply chain? What brought you into the fold?
Consulting. I was a consultant, contracted to work with php. But I enjoyed the topic and I stayed with the topic. Open source lets you continue to contribute and to work with, the project, with the community.
Was it inventory that they had to provide? What was the use case around it?
You can invest a lot of work to scan a project or to scan a dependency and, you want to share these results. You could try to share a dump from the database, but SPDX is the natural format for that. You, do the work and you export SPDX and the next person then can import the file and validate it and even compare to another version of the same project.
You were using this quite a long time ago. There mustn’t have been a lot of tools. Is that why you got involved with developing some of the Python tooling. What kind of language came first or implementation to deal with these things?
The first implementation was without a library, building it by hand. So that’s the, not so nice experience.
After that, I think a few years ago I created Haskin Library for SPDX. I think no one except for me is using it, but I’m maturing it a lot. Channel tools are very prominent. I use them in SW-360, which is a software catalog application. They are the natural, connection point to SPDX core system.
Python tools now as a, very easy to script, language and, also used in a lot of, compliance tools.
That’s one of the things that, I was talking to you a little bit about was developing a library in Rust.
Nice to see SPDX in so many ecosystems. But yeah, the, full-fledged SPDX document library is missing.
What’s your biggest challenge with dealing with software bill materials and communicating or transferring these back and forth?
The biggest hurdle I see is that the formats like SPDX and CDX provide a framework tool in which you can express, the dependency tree, but you still have a lot of freedom how to express stuff, especially in SPDX is very flexible in the generic tree that it allows to build. So you need to make many assumptions in how do you represent docker, image, how do you represent the layers in the docker image and how do you link packages and files that are contained in that.
All these, decisions are made differently by different tool implementers and different users. Even if you get SPDX document describing a Docker image, it’s not easy to just consume that automatically. You need to still treat it as a SPDX document generated by tool “x”, with assumptions “y”. That makes at least the automatic usage harder. That’s one of the biggest, complexities that are still around. They define a language but not, specify yet how to express specific use cases.
SPDX 3.0 is coming out. Can you give us any insight into what that’s going to do and how that’s going to change? Because it’s pretty functionally different. It’s a major version bump number one, which indicates it’s going to be potentially breaking changes.
I just complained about flexibility, about being too flexible, that it makes it hard to ingest. Being very agnostic to ecosystems and being possible to extend, that’s one of the main features.
The next main feature is that it, reworks the core model in a very consistent and foundational way that allows the data to be very easy to consume and to link across, the whole ecosystem. so you can reference, aspects, notes from other, documents and build a large knowledge base.
In the big picture, all the files are one big graph. 3.0 emphasizes on that and, improves on that by being more streamlined and more modular and more consistent.
Looking at the lineage of SPDX, there’s a lot of different use cases for it, and hardware is one of the big things. There’s the automotive industry, that’s using it as well for interchange data. So it’s not just software. That’s one of the misconceptions that people have about SPDX or even CycloneDX, is there’s more than just the supply chain.
What would you say to somebody who says, oh, SPDX and software bill materials is just software composition analysis in the standardized format?
Software composition analysis is postmortem analysis that can be part of an audit. A good way how to represent that is a Software Bill of Material, the output. So yeah, there’s this link, but can be so much more and can be way more integrated
If you are a link source, SBOMs to build SBOMs, you can track the whole supply chain and you can automatically hand that down, to the next user that receives your artifact and thus already has all the information, making, post modern analysis or scanning for hashes, hopefully unnecessary so that all the information, all the data is already there.
Still might still be valid to do some, sanity checking and, checking for snippets and, checking that there’s nothing unexpected. But, having consistent and automatically generated Software Bill of Materials documents just solves the issue.
Yeah, there’s multiple places where these things get created. Software composition analysis and the vulnerability scanning and the things that people are thinking about with SBOMs is a specific use case. It’s not the only use case.
There’s been conversation about building software. We have a Software Bill of Materials that gets generated that talks about the contents of what’s going to be built. And then we have the vulnerabilities generated and it could be VEX it could be put into whatever structure you wanted. Then there’s the build tools that run things, and then the infrastructure that it runs on in production, and potentially the API endpoints and the definition of what those are. It can be used anywhere, I guess you could say.
But when you talk to hardware manufacturers, they’re like,” That doesn’t work with firmware. And that doesn’t work with, the hardware inventory.”
SPDX can take care of a lot of this by defining the structure. But when we talk about that flexibility, do you see another kind of document or structure or instructions that have to follow that document to tell people how to ingest it and use it?
I saw a document by Kate Stewart recently where she together with others, discussed a list of, types of BOMs.
So there’s a build BOM. There’s a software BOM. There’s a runtime BOM. Describing these use cases and the requirements and the assumptions, related to them. Maybe the SBOM, the document that is sent around, can say, I’m a BOM of that kind, and I have these assumptions in mind.
And furthermore, they might be helpful and necessary to even, say I’m a, software composition describing a BOM document for Docker image and you can find the structure in the following node. Maybe such standards would be helpful to develop.
That makes sense. It’s almost like saying SPDX is like a document format. You have Microsoft Word format, which is different from gDoc format. You can translate them back and forth with each other, but it’s almost an abstraction. It’s how you structure the file format itself versus what the intent of the file format is for.
You’ve been involved with Software Bill of Materials for eight years now. You’ve seen tooling develop from the ground up, from nothing. What’s with all the hype today? Why are people just starting to pay attention. What’s happening in Europe to fuel a lot of this software bill of material conversation?
Here in Europe we have a great community of open source officers from companies that just started talking to each other a few years ago. Some of them became the OpenChain reference tooling group. There’s the ToDo group, which is obviously very active in that realm and also other, efforts.
Open source just motivates to use open source and be open source if you need to handle open source. And that’s just caused, this growth in community.
Do you think that some of the supply chain vulnerabilities in breaches and, of course we got the infamous Log4J and Heartbleed and you name it. Do you think that, that had anything to do with people getting a bit of a spark to go ahead and start looking into these things?
I think that caused the push, but I always was more on the compliance side instead of the security side. I’m the compliance guy that, tells you, Hey, that’s good to understand what you contain and all of that. I think there was a push on the security side, but I wasn’t on the bleeding edge of that push. I was, seeing, hey, there was others now liking the data I generate and they are happy that we can all exchange that, but it’s, at least in my small sub-community that I’m seeing, it’s not the predominant push, except for within the companies where they all are working. They might get that push from somewhere.
Security it’s prominent. It’s visible. It’s obviously a reason. But it’s, it’s not talked about that often. it’s more so a more pleasant push from somewhere.
So something that people are just doing it from a quality perspective almost.
The environment in America around it, it’s very much, “Hey, we all have to do this because the executive order says so.” And all that the executive order really says is departments can request a Software Bill of Material from you before they agree to purchase it. Just because they’re requested doesn’t mean you have to give it to them. and all this ambiguity comes out of it.
So there’s this buzz around it Is it security that’s pushing these things? You’re talking about compliance, which could be part of security. Is it engineering? And as an organization, why would I use this thing? Why would I develop Software Bill of Materials or produce them today if there was no executive order?
For the companies it’s, security, which is the big thing, the big, motivating factor for the, companies. They are big numbers next to potential security issues. That’s how I think, from top down they generate a push towards, “Hey, we need to understand where we are vulnerable and if there is a potential issue, we need to understand where might we be affected.”
That’s probably like witchcraft to somebody, right? You’re telling them about this and a lot of people are going to glaze over and say, okay, I need it for the government to actually make my sale to have this customer. It’s interesting.
When you’re using these things and you’re reporting vulnerabilities, there’s going to be a way to attach a VEX, the vex having the contextual information if something’s vulnerable or not. I’ve got so many questions about VEX because it’s going to be manual. Machines can’t automate an interpretation of these compensating controls that are out there.
Let’s say you have a list of vulnerabilities with your software components that you’re using. You provide this to a customer, and the customer scans it for vulnerabilities and says, “Hey, you got five critical vulnerabilities. You need to fix those tomorrow before I buy your software.” How does that go? This is one of those complete edge cases of the practicality of using a Software Bill of Materials, especially when you have vulnerability information attached to it or someone generates vulnerability information.
What happens when you’re providing this document and someone gets it and says, “Hey, this is a bad piece of software.” Let’s say it’s Microsoft Word. What do you do? What do you see happening there?
I see a lot of chaos. I see a lot of stressed developers that try to act fast. Developers that get this information, get the Bill of Materials and start trying to understand, hey, do we really have this transitive dependency, what is it used for? Then starting to try to dig through the dependency tree. Maybe it’s, this is still encoded in the dependency tree. Maybe the Bill of Materials was generated by a different service in the same company and the developers just were hacking and different service generated this Bill of Material and now they are confronted with this problem.
I see a lot of complexity, a lot of work, trying to find that source of that and then trying to upgrade to latest and hope that it’s green and trying to push that out.
Is it technical debt? Is it even reachable from a traceability perspective? There’s hygiene that has to be thought about. It comes back into the software world of these best practices that we’ve always talked about with DevOps for the past 14 years?
I think SAP at some point developed a resource tool that is a vulnerability database for vulnerabilities in Java, specifically. They linked the patch that fixed the vulnerability to the vulnerability. Then they did static code analysis if any path touched code that was touched in the, patch. So if you are before the patch, can you reach code that is touched by the patch? And if not, you are not affected.
So they had some static and dynamic code analysis to find if someone is affected automatically. But I’m not sure what the state of this project is.
Being around these things from a technical basis, part of projects that are using these and implementing process to interchange information, where is this going to be in five years?
I hope then that the SBOM formats formats gets the sufficient level of support and maturity to be natively integrated with most of the build tools. That also tools that compose stuff like if you build a Docker image that there’s a way how the Docker image understands I just package a jar file in that docker image, that jar file had… material that all this ugly stuff that is right now hard. All these disconnects between build systems and disconnects between all the machine understandable ecosystems that somehow they are fixed and there’s glue between them.
And that the composition of a bigger SPDX or CDX or SBOM file can be better, can be automated, can be improved. That is the technical side where I hope for improvements and for the ecosystem to catch up.
So the other side, there could be a huge database of all of the information, all the universe of materials, maybe as a big database or something that’s running locally in companies just to, track all the assets, track how they were built from where they built, all this information. Have that consistent and available and discoverable.
Right now the Software Bill of Materials feel like… I need to provide them. I provided them. Some company is requesting them since they need to have them, further, but I’m not sure if they are really leveraging SBOMs, if they are just storing them and handing them to the next customer, or if they are actually gaining knowledge from them and linking them together.
There will be a lot that can happen and that can improve and that can make the whole ecosystem better.
This episode of daBOM was created by me, DJ Schleen, with help from sound engineer Pokie Huang and Executive Producer Mark Miller. This show is recorded in Golden, Colorado, and is part of the Sourced Podcast Network. We use Captivate.fm as our distribution platform and Descript for spoken text editing.
You can subscribe to daBom on your favorite podcast platform. We’re going to be releasing a new episode every Tuesday at 9:00 AM. I’ll see you next week as we continue to diffuse daBOM.
Resources From This Episode:
He is part of the Linux Foundation project FOSSology, as a committer and in the the Steering Committee. Further he is also involved in
SW360, which is currently an Eclipse incubator project. He previously gave FOSSology related talks on the Linux Foundation Collaboration Summit 2016 and the FOSDEM 2018.As a senior consultant at TNG Technology Consulting GmbH he develops open source tooling and is the contact person for FLOSS related topics.
Wearing the hats of both a technologist and a policy maker, Allan has over 15 years of experience in international cybersecurity and technology policy. His experience and research focuses on economic and market analyses of information security. On the practical side, he has designed, convened, and facilitated national and international multistakeholder processes that have produced real results, helping diverse organizations finding common ground on contentious, cutting edge issues.
Allan is known for applying technical and policy expertise to help audiences understand the pathways to change in an engaging fashion, and is frequently invited to speak or keynote to industry, academic, and public audiences. He has significant experience with the press, and has been featured in global media including CNN, NPR, and major American and international papers.
Follow him on twitter @allanfriedman