Why Quality Matters
I often can’t get over how small the world actually is.
Earlier this year, I attended the Second Annual SBOM meetup after the first day of the RSA conference. The venue was at a little bar on Minna Street, tucked away underneath the skyscrapers of San Francisco.
The bar was filled with quite a few familiar faces and after grabbing a cold beer, a hand reached out through the crowd to shake mine. Standing in front of me was Ritesh Noronha.
I’d never met Ritesh before – or so I thought for a brief moment. He asked me if I had coded “bomber” – an open source project that scans for security vulnerabilities. He then explained that he had been following the project for a long time, and had commented on some of the issues in the project. It turns out we had met before – on GitHub.
The odds of meeting each other at an event in San Francisco seemed almost infinite, but here we were discussing SBOMs and Open Source.
It turns out that Ritesh and his business partner, Surendra Pathak had also been building incredible open source tools to work with SBOMs and during our discussion we all started to talk about Quality.
SBOM formats are notorious for being so flexible that any tool can potentially create one that could just be a collection of “NO ATTESTATION” values – and this potentially renders them semi-useless – but Ritesh and Surendra have been busy creating open source tools that provide an SBOM quality score.
Need to see if an SBOM conforms to the minimum requirements as specified by NTIA? Then you really understand that quality matters.
Welcome back to daBOM.
Hey, welcome back to daBOM. I’m DJ Schleen. I’m here with Ritesh Noronah. Ritesh, I’m really excited to talk to you. We met in San Francisco during their RSA Conference when we were at the second annual SBOM gathering.
You came up to me and you talked to me about this really awesome tool that you found in GitHub. I think it’s scanned SBOMs for vulnerabilities. Maybe it’s called bomber. How did you find that? Tell me a bit about that and the awesome people who wrote it. (For those listeners who are wondering, the reason why I’m laughing is because I wrote bomber.)
We were in the process of building a quality tool. We were looking like, who might need this. We were clearly using it. We were looking at SBOM scanners. One of them which popped up front was bomber. And more importantly, it had the name Kungfu in front of it. I was like, that’s interesting. Let’s see what’s in this..
We looked through the issues. I found, somebody had a SBOM which was not giving them any results. I said, okay, that’s something that I would like to check out with the tool. I ran the tool, I found the issue, and I filed an issue with bomber. That’s how I’m like, oh, this is great.
There’s a tool called bomber, which consumes SBOMs and there is a quality scoring tool which helps bomber do a better job of it. That’s how I came to know about bomber.
You graduated from the University of Mumbai with a degree in mathematics and computer science. How did you get there? What brought you down that path?
I was born and brought up in Bombay. in India. I was very interested in science. I did, a bachelor’s in mathematics and science, actually. . I was very interested in computers at the time. I got my first computer , when I was like 15 years old. It was very interesting playing with that.
That got me interested in programming and, mathematics essentially. I studied at St. Xavier’s College, which is like a big college in Bombay. That’s how I got into the whole space.
When you were programming on that, were you programming in C or what kind of languages were you programming in?
At that point I was doing Borland C + was my go-to compiler. It was C and C +, but very basic programming, nothing advanced. The biggest challenge on that time was getting Linux installed on that thing. And getting X windows running. That was fun.
DJ Schleen: You went and you got your degree. Then what happened?
I have about 20 years of experience in engineering. I’ve been in software engineering throughout my career.
I’ve worked in startups and large corporations. I’ve also worked in embedded and scalable cloud systems. I have experience across the board right now.
I really enjoy working in problems which pushes me to think outside the box. Software risk transparency is what I’m focused on right now and that’s consumed my mind completely because of the experiences we had at my previous company. Right now I, along with my co-founder, Surendra, we have founded this company called Interlynk. Our mission is generally to help reduce the friction between coordination of risk between companies, between a person who’s selling software and buying software.
That starts getting into this whole SBOM conversation and the quality and the trust. How do you gain trust and compliance in your software supply chain? What does trust mean?
I believe trust means transparency. Being transparent is very important. Because when I’m buying software from you, I need to know what you’re selling me. That’s very critical and that builds trust. I can be assured about your practices, like if you’re updating components correctly.
Because I’m running it in my stack and I’m managing data for, let’s say, partners who have sensitive data. I need confidence that, whatever I buy is that.
So hence we are going down the path of SBOMs first. Get SBOMs and make sure at least what we’re getting in is kosher. Is something that aligns with my company standards. That’s why we are down this path.
DJ Schleen: That’s where you get into these high quality SBOMs and the minimum requirements for software bill materials that came out of the Executive Order 14028. and then, NTIA came out with the minimum requirements. Is that all we need to think about with quality of an SBOM or is there more to that?
What we found is when we are building consumption tooling, can we make sense of it? That’s the journey we started in July last year when we founded the company. We quickly realized not all SBOMs are made equal, even though they follow industry standards like s SPDX and CycloneDX.
They’re not standard. SBOM generator A does things a little bit differently than SBOM generator B. That throws off the consumption tooling. As we were integrating more of these generators into our system, we found that it’s a problem. Consistency Is a problem. Does it even align with some Industry standards?
We picked NTIA minimum elements, which is a great standard just to say that, is it consumable or not? Then there is, SCVS by the Cyclone DX community, which is going to be coming out very soon.
We wanted a tool to capture all this knowledge that we had about issues we faced with SBOM and surface it back to the community and to the generators, with the hope that they harden it.
And I got to say like we filed a lot of tickets across a lot of SBOM generators and they have more than willing to address the issues. It was only a matter of creating a tool and surfacing these issues was basically what it needed. That’s where we’re seeing a lot of traction right now about people using the SBOM-QS tool to validate a lot of things.
One of the things that we found out very recently was when GitHub released the GH-SBOM Tool. They use SBOM-QS to validate its quality, which was very great for us.
This is great. The community can consume this and use this right now.
If you think about the grand scheme of things, this is a part of a bigger system. That’s how we use it internally. It’s a part of our bigger system. But to open source it, we needed a tool.
Yes, we will release a library very soon, which is open source that people can consume it. Now obviously once you release the library, you needed to build it for rust. You needed to build it for other things to consume it. But we’ll start off with Go for sure because that’s what I’m familiar with. Maybe down the line we’ll go on with other languages.
That’s a very important point to bring up. There’s a misconception in the industry that I use tool A and tool B, like you’re saying, and these SBOs are exactly the same. But some of them, even if it produces a CycloneDX or SPDX or SWID format they’re missing fields, there’s no attestation in places. You convert that to CycloneDX to put in a GUAK, let’s say, as a centralized open source storage, you’re going to lose that layer.
You have to think about how do I convert that and put it into another field. And then all of a sudden you’re mutating that non-standard SBOM into another non-standard SBOM.
I know you have a open source project that merges SBOMs together as well. How do you deal with that one? There’s just these compatibility issues.
The way we think of the world works is people will generate SBOMs for individual applications, but when they ship it out as a product, they ship it out as a one big product, , which is composed of libraries, applications, and what have you.
At that point of time, they would need to take these individual SBOMs and merge it into a gigantic SBOM, which would be easier to consume and manage for the, purchaser of the software. When building this, what we found is, okay, let’s merge these individual CycloneDX SBOMs. It’s pretty doable. There are algorithms to do this and all that.
But what we found is the way tool A generates a dependency tree is very different than where two B generates a dependency tree Now, when you merge these two, what happens? The dependency breaks. There is no easy way to do this.
I can give a very easy example. Because the spec doesn’t specify exactly what the dependency tree should be, it gives them freedom to do it. We have this one tool, which the top level dependency is actually the file which is used to generate the dependency, the manifest file, like a Go more file. Then it puts all the dependency into some level. When the tool is looking at it, it’s looking, oh, it has transitive dependency, it’s great. But no, it’s not transitive. It is just one level deep.
From a vulnerability use case standpoint, what people really want to know. Is my vulnerability in my direct dependency or not? Because that is something potentially addressable by the end customer. It becomes very difficult to find that out if the dependency tree is incorrect or cannot be consumed . If I cannot figure that out, that’s a problem
If you have two top level components that share the same dependency and the same version, when you’re merging SBOMs together, those are two different dependencies and two different SBOMs. When you merge it together, do they point at the same transitive or dependency?
Merging has two algorithms. There’s a hierarchal merge and there is a flat merge.
In a flat merge, everything is flat line, there’s a bunch of components and there are a bunch of dependencies. You lose all relationships. Some people love that. Some people don’t like that because for obvious reasons. Okay, I got a vulnerability now what is impacted? I have no idea.
But a hierarchal merge is different. It retains the linkage to the parent component. That’s why we’ve chose that as a default, because we believe, when you merge it, you would want to know if there’s a vulnerability. Who is it linked to? That supports that use case.
There’s just so much underneath. Yeah, and it’s licensing too. It’s not just vulnerabilities. Those transitive dependencies could have an AGPL license, LGPL license, those kinds.
Oh, for sure. For sure. but the good thing is, as the generators get better and better, it’s changed vastly from the year that we’ve been in there. It’s just a sea change.
When you merge in SBOM together, these things are huge as it is and they’re going to get even larger.
If not human readable and it’s json not that you would want to read an SBOM anyway, with human eyes, what is the biggest issue that you have with the volume of SBOM data or that you can see with the volume of SBOM data, the size of these files?
The size of the fires that we’ve been dealing with has not been insurmountable or not gnarly as such for now.
But we are part of the CISA working group where we attend, we met one guy, we said that he has a SBOM, which is of 90 MB in size, one SBOM. And that becomes crazy. Now, we didn’t get access to that SBOM, we couldn’t play with it, but I would really like to know what’s in that 90MB SBOM.
That sounds like technical debt.
It might be a requirements text. Somebody didn’t run go mod tidy or something like that in their mod file.
It is quite possible when you merge, when you have a massive product merges, it’s quite possible to get large, but the reason I don’t think it’s a burden because eventually these tools will be managed by Internet scale systems.
If I were to believe what CISA says, that these are machine speed generated, they have to be machine speed readable. For me scaling is not an issue. because Advertising is a last scale business. It’s very easy to do all these things because these are proven things to do, managing and processing large amounts of data.
I’m not very concerned about that part of it. I’m concerned about how we have to be machine speed for this to be viable. It cannot be event driven one-offs because then it doesn’t make it viable.
When you’re merging those SBOMs together, you can also merge VEX and VDR in there more VDR, because if you’re dealing with CycloneDX, that’s the vulnerability node in there. VDR is a subset of that, which for the users we have an episode that, aired before this one, talking with Steve Springett and about VEX and VDR as well.
Do you put those VDR in that merged SBOM? What do you recommend, I guess I could ask, because you’re only as good as your last vulnerability scan. How do you deal with that?
We’ve been internally talking about this one for a while, Do we believe vulnerabilities should be tied with an SBOM or not? Essentially, that’s the question; should they be tied together.
I believe vulnerabilities are a living organism, right? They never die. While an SBOM is a blueprint of the software that is shipped out.
The day you generate the SBOM, the vulnerabilities still could change. I believe they should be completely separate. They should be living in parallel words and should be linked together. Linking is fine, but inclusion into the SBOM itself, I’m not a big fan,
I love the analogy of blueprints. It totally resonates with me.
This year’s been sort of crazy. SBOM has become this word that’s being thrown out everywhere. At RSA though, the big, topic was zero trust and I was looking for SBOM, and I’m wondering if next year we’ll see more SBOM in there as we start dealing with some of these government recommendations. Are those minimum SBOM requirements from CISA enough? What’s missing there other than license? License isn’t in there at all?
The minimum elements has a requirements and there is, should have like nice to have.
The nice to haves are missing kind of thing, because that’s where the transmitted dependencies come in. That’s when the depth comes in, and that’s when the known unknowns come in. Actually that’s the gray area of SBOMs.
DJ Schleen: I’m gonna put you on the spot here and ask a question. Which SBOM format do you think is the most developed and extensible now. If I’m a person who’s asking for a vendor’s SBOM, what format would be the best to get me the most information?
Actually, that’s a great point, but it’s a two part answer.
Our tools that we build are mostly format agnostic. We work with, both formats right now and mostly SPDX and CycloneDX. It depends exactly what the customer wants, right? The customer just want components. I would say both formats are great right now. Both formats have exactly what you want for components.
But if you go beyond components, CycloneDX is the hands down, the format you want to go to. if you wanna list your services, if you want, list your compositions, if you want to list, the external references. All of these things are hands down, CycloneDX has great support for. I’ve heard, also heard that they’re working on AI ML for now, and that’s the upcoming field to be documented, because with all the trust, safety things, you have to know what’s in these models. I definitely think they have a edge there on the format itself.
They’re constantly developing it. They’ve got 1.5, even 1.6, they have some feature roadmaps for those.
And SPD access version three coming out. I’ll be interested to see what’s in it. they just launched the RC one. That’s my weekend reading!
One of the things that I personally like about Cyclone is just the structure of it and that Pearl is one of the first class citizens. Especially when you’re looking for vulnerabilities, you can use that to find things out and pull down the appropriate information.
When we look at SBOMs, you touched on a point where there’s the components, but there’s much, much more. There’s licenses and then there’s all the files that go in there. Yeah. And the hashes for the files. You can go really deep.
What have you seen these tools generate? Do they go that deep or do they have options to go that deep when they’re generating SBOMs?
I’ve not seen a single CycloneDX generator generate files. SPDX definitely does that. You have a file section where you have all the files and their relevant hashes. CycloneDX, I have not found a generator. Maybe there are, and I’ve not come across it.
As you come to services, as you come to the other parts of CycloneDX, I don’t see any generators for those either. Is there an automated way to do this? Maybe there’s not, and maybe it is a manual way.
Assuming like there is, monitoring tools you use, which know exactly the API endpoints you have and that can export a services (to) CycloneDX. Maybe that’s the use case. Maybe it’s not using a repo.
Where do you see SBOMs going? You’re seeing these every day. You’re working towards the quality, the supply chain, trust. Where do you see this going in this semi-new ecosystem and industry?
I’m transporting myself five years ahead. Clearly it’s a new technology and adoption has its ups and downs, right?
But the way I think of SBOMs is, hey, right now, what is the scenario? Right now, what is the scenario that we faced in my previous company? Everybody generates a view of the application that is tied to a tool and is tied to a vendor. You can use any tool in the market. You’ll get very similar results to what an SBOM gives you. But it’ll be tied to a tool for which your vendor locked in.
SBOM breaks that. SBOM says, hey, the data is consistent. Vendors can add value added things on top of this data. I see that as powerful. I see that as, hey, now everybody knows what the components are. Let the vendors, or let anybody who can read this data make sense of it and create value on top of this.
Eventually I see that once engineers, like people like me, who would generate this data and see the value, like when I’m including a component, I can know exactly what I’m including, and make decisions based on that. It automatically promotes better hygiene across the board.
If I don’t want to include a component because probably there are no maintainers, or there is a lot of vulnerabilities which have not been addressed for years, that has a signal now to that repo or the repository or that vendor, I don’t want to include it. And that has repercussions.
I think people would like that. I would like that as being an engineering leader in the organization. I would like that what are people including, because I can tell you for sure, engineers don’t have the tooling right now to make those informed choices, in a formal agnostic way. We are not tied into particular vendor and the community can provide that information for them.
I believe it’s going to be transformational. It’s going to be very important as adoption increases. I’m seeing adoption increase every day.
I’m hopeful that, it keeps on this part.
Is this going to get us to the point where finally we have that inventory, that enterprise inventory, that we know what components we’re using. We know that we’re using 15 different versions of log4j for example, and two teams are still using the one that’s vulnerable. Are we going to get to that point where, that utopia, where organizations finally know what they have?
This is the step towards that. According to Dependency Track, they just released they’ve been downloaded 300 million times. That says everything to me. They may not be transparently sharing SBOMs right now, but they’re using it to fact check what they’re doing themselves.
If I’m running an engineering team, I want to know what is there in my stack. It’s as simple as that. Yeah. Add on benefits after that. Vulnerabilities. Compliance. That is definitely all great after that. But just knowing, cataloging stuff, is very important.
If you’re going to get started with SBOs today, is that the first step?
It has to be because if you don’t do that, it’s pointless. I believe it’s going to be like cataloging, value addition and then transformation.
Where do vulnerabilities come into that? Is that something that we detect, further left with the developers generating that, or is that a security concern?
No, I’m thinking right from a developer workflow. I’m including a component. I’m time bound in releasing this feature. I see five vulnerabilities. What do I do? And this is the only tool that I know which works. Do I stop it, not stop it? What do I do? I at least have that information, right? And which can be then discussed in a meeting. I’m going in knowingly and we can probably address it with the help of the security team in a later release, right? Because there are a lot of, forces at play. The information needs to be surfaced at multiple locations during the live park as left as possible is great, but it has to be throughout the lifecycle,
That’s a great place to start. And then, asking for these from our vendors. Then we start getting into that inventory issue.
Yeah. Asking from your vendors is complicated. A lot of open issues right now to be address.
What’s the number one issue with that? ,
There is a fear that I’m revealing my secret sauce. I don’t know how well-founded that is. There is a lot of, things you need to build out there to give them that assurance that this is not true, while at the same time revealing there is a component that is redacted And these are the vulnerabilities associated with them.
There is all that thing that needs to be built up to give them confidence, like, hey, I can confidently share this without having any repercussions. That doesn’t exist today.
This episode of daBOM was created by me, DJ Schleen, with help from sound engineer Pokie Huang and Executive Producer Mark Miller. This show is recorded in Golden, Colorado, and is part of Sourced Network Productions. We use Captivate.fm as our distribution platform and Descript for spoken text editing.
You can subscribe to daBom on your favorite podcast platform. We’re going to be releasing a new episode every Tuesday at 9:00 AM. I’ll see you next week as we continue to diffuse daBOM.
Ritesh Noronha has 20 years experience in software development, ranging from startups to large corporations. He has worked in embedded systems and scalable cloud environments, while enjoying the problems that push him to think outside the box. Ritesh thrives in fast paced and challenging environments. He is currently working on product solutions to help facilitate software risk transparency of software supply chains through focus on SBOMs, VEX, and SBOM Quality.