Brian Fox and the Creation of Open Source Repos
As the video connects I see Brian Fox, sitting in front of a collection of model spacecraft which adorn the shelves behind him.
It’s a fitting backdrop for a conversation about the genesis of the software supply chain problem, and how exploration and discovery has led us to where we are as an industry today.
Think about this, it all started when we began to assemble our software from components that we didn’t write ourselves.
And Brian was right there.
He was there since the beginning of the open source supply chain universe – a pioneer of sorts. A contributor to the Maven ecosystem, and today he’s at the technical helm of a successful company that enables the promise of making safer software sooner.
I had the pleasure to work with Brian in the past, but I never had the opportunity to hear his story until now.
Welcome back… to daBOM.
Hey everyone. Welcome back to daBOM. I’m here with Brian Fox from Sonatype. Thanks for joining us. I’m excited to talk to you about Software Bill of Materials. So I wanted to definitely pick your brain on things, software bill materials related you’re an expert on supply chain.
I’m going to kick this off. I want to know how you started coding.
How I started coding. Wow. Yeah.
When was the first time you opened up a computer and you’re like, I got to write some code? Because you’ve written a lot of code in your life.
Yeah. the earliest version of this, believe it or not, was some little, I don’t even know what it was, it was a keypad. Keyboard with a small computer sized LCD screen that we got at a timeshare presentation that ran basic. I started programming basic on this stupid little thing that had a 10 character, one line at a time thing and no, No real way to save. and restore any of the programs.
I fascinated my cousin with the ability to hack, games on the Commodore 64 after that, which were also written in basic. So that’s the earliest, if that counts as programming.
That’s pretty cool. So Commodoare 64, using a tape drive to load that stuff in.
Yeah, it wasn’t even mine. I just figured out how to read the list of all the Jeopardy questions and change some of them, and really spooked my cousins on that one. It was magic for them.
Did you think like fast forward, many years that you’d be involved in the security industry and creating products and, dealing with supply chain issues?
No, not at all. I actually did not intend to study computer science and do programming. I wanted to be an aerospace engineer or a pilot and was on that path for most of my career. Through high school I was in Navy Junior ROTC on the way to Air Force. At the time the Air Force was guaranteeing scholarships a hundred percent for three years if you had a computer science major, even if you were going to be a pilot. Which never made much sense to me why they would pay you to learn one thing and then have you actually do something else.
But they assured me that was the thing. I had built my own computer years before that, I think in sixth grade, built my first PC clone. So I was like, that sounds cool they’re going to pay for my college, let me go down that path. And, that was a path I was on for a while.
But then, Clinton and Gingrich had a showdown. They shut down the government and canceled all the scholarships and left me hanging.
At that point I said, I guess I’ll lean into this computer science, degree. Seemed to be pretty good at it. And it’s actually, pretty interesting.
The thing is, I never imagined myself sitting around coding in a basement as the stereotype is. I did that a bit in my early career, but as I’ve done that less later. which in retrospect is less surprising. For me, the coding was always a means to an end to try to solve the problems.
But you did write a lot of code and there’s this one project that pretty much everybody who’s listening to this podcast is going to know about. And it’s called Maven. So tell me about how Maven started. How did you get into that?
At that time I was even not really programming full-time. During the day, I was a development manager, at a small company doing project management, product management, and engineering management.
We had an effort to modernize our infrastructure to move from ANT and CVS to Subversion and something better. The people working for me recommended some minor changes and I had started poking around with Maven in the process. This was right at the time that Maven One to Maven Two was being, it was basically early alpha. I got involved in that and said, no, this is the way, this makes way more sense than what we were trying to do with incremental changes to ANT.
What was interesting is what happened at that time is my daughter, I don’t know, she might have been six months old. Once my wife and my daughter went to bed, I started hacking on Maven and Maven plugins at night.
Basically what happened is the people working for me would run into issues during the day. At night I would go home and fix them for them. I ended up in that process creating two very popular plugins that pretty much everybody uses, the Maven Dependency plugin and the Maven Enforcer plugin.
Those initially were not part of the Apache project. They were part of the old, Codehaus project, which only old timers will remember that. But they were so popular even back then that the project basically voted to pull those in, and I became a committer in the early days alongside that. Subsequently became a PMC member, which is an official voting manager of the project.
Then several years after that actually was the PMC chair for a number of years as well at the Maven Project.
So Maven Central, where did that come from?
If you’re not familiar with Apache Maven, Maven is built around the concept of convention over configuration. So there’s a lot of standards. Instead of having to write the scripts, if you do the right thing as a tool, will do a lot of magic for you.
Part of that is dependency management. In order to build the entire dependency tree, for each module you use it has to declare what its dependencies are, all the way down. It can fetch all of those and resolve them.
When we were basically bootstrapping, we had this problem. All the software that everybody wanted to use was not already built with Maven. They were just jars kicking around.
The old school way of managing this, you’d have an ant file that said, include all *.jar from the l ibs folder. Everybody just threw random stuff in there. It was a nightmare to manage.
When Maven started coming along, we basically created Central as a way to share hand-built POM files. This will be an interesting thread to pull when we get into BOM files. I think we’re in a very similar state… that all of the preexisting stuff did not have the Maven, it’s called POM with a “P”, Project Object Model, did not exist. Central literally was a subversion folder that had POM files that people created and shared with each other. Eventually they started sharing the JAR files in the convention alongside those, and then that moved outside of Subversion into an actual file share system, a website, basically.
Through a fluke of history that repository was never officially part of the Apache Maven project.
At the time, there was some concern around the Apache Software Foundation hosting binaries that were, for example, GPL licensed. It was always managed as this external thing that first was hosted at ibiblio, which is, out of, I think, university of North Carolina.
when we founded Sonatype, we basically just kept paying for it and continued to manage it until this day. So that’s how Central came around, but it was quite literally a sharing place for people to share with each other hand-built Maven files that listed all the dependencies.
This is an interesting story and background here, because now we’re at the point where we’re almost going back to that POM file with a BOM file. We see like requirements.text. We see POM.xml. We see all these different kind of definitions for dependencies. Tell me a bit about how we go from POM to SBOM.
At some level the Maven POM file is an SBOM, because it lists your dependencies and it’s recursive. But it’s a very Maven specific format. I would say all of the package managers and repos that have come since .net and PyPi and NPM and Ruby Gems, and, conan and even Go to some extent have followed a lot of the similar models that Maven first put down.
It made it really easy to use open source.
I remember prior to that it was a nightmare to do anything more than cut and paste like a method. Trying to get somebody else’s software to build inside of yours, it was often easier just to write the code yourself.
Maven made that really easy and everything followed along from that. But a bill of materials, an SBOM as a generic state needs to be able to express more than just Maven components and NPM components. A given application can have a bunch of those things in there.
That’s why there was this push to create yet another standard. But it is fairly easy to create a bill of materials if you have the Maven dependency information in front of you because it’s already spelled out and it can be calculated for you.
Let’s take this back, the security around the software components.
My journey was trying to solve this problem before everybody was using manifests.
Around 2012 or so, we started to notice this pattern within Central that some of the most popular versions of important components like say Bouncy Castle, which is a clean room crypto library implementation for Java.
The most popular version was a version that had a level nine security vulnerability that had been fixed for over two years. And we started scratching our heads and saying, why is that? We’re pretty sure it wasn’t intentional that people didn’t understand what was inside their software and they didn’t understand that the thing they were using was actually vulnerable.
Circa 2012, even though Maven had been very popular for probably almost a decade at that point, many enterprises still had their legacy builds were on ANT and other things. They didn’t have these POM files that we’ve been talking about that could list out the dependencies.
The aha moment that I had was it doesn’t matter how these applications are built. At the end of the day, certainly for a Java application, you’re going to deploy it in an application server and it has to conform to a standard or it just simply won’t run.
There’s a well-defined standard for what an EAR looks like and a WAR and a JAR. We realized that we could work backwards from there. We started figuring out that we can fingerprint these components based on the hash of the outer component. We can go further down and even, use ASM like technologies to recognize what’s inside the JARs and whether they’ve been slightly modified or compiled.
The reason for that is we were effectively building what the world now calls an SBOM. We were doing it because we needed to help the customers manage their supply chain.
Fast forward, what, almost a decade from that point, everybody started talking about producing SBOMs and I was,” This is not new.” because we had been doing it, but we weren’t talking about it as an SBOM. We were talking about it as supply chain management.
And of course, you can’t manage your supply chain if you don’t first know what’s in the supply. That’s why we first built that inventory in the system, but then very quickly moved beyond that to helping people manage the components from an architecture, a license, and a vulnerability perspective.
From that you can also look at it and say, “Hey, based on our research around security vulnerabilities, we know that this specific fingerprint has these specific issues,” and that’s how we start getting vulnerability information out.
That’s right. That history of us having run Maven Central allowed us to get to that, “Aha moment”, 10 years before the rest of the industry really started paying attention to it.
In the early days, it was very evangelistic. We had to convince people that they were in fact using open source. And famously, our CEO was talking to the Chief Information Security Officer at a large healthcare who happened to be one of the largest consumers of open source from Central at the time, he didn’t know they were using open source.
There was a major disconnect between what the developers were doing and what the leaders recognized as risk that they should be concerned about, right? But we were able to see that and start evangelizing that there’s an important thing to pay attention to here.
That happened, a very long time ago for us.
But that’s interesting. So when we start thinking about open source, a lot of companies don’t know this, and you and I have been part of many surveys, community surveys in the past where we’ve seen anywhere from 85 to 97% of all software being open source components. How does that drive the importance of software bill of materials when we start looking at exchanging this data between organizations?
Organizations after unfortunately SolarWinds and Apache log4j, the log4shell have recognized this. This is now accepted fact. So I very rarely have to evangelize that fact anymore. which is nice. It means we’re moving forward.
But we’ve also seen that 96% of the time when something from Maven Central is consumed that is already vulnerable, there’s already a fix available.
There’s been a lot of talk about open source needs to do a better job and we need to fund them and all these things. At best, that’s solving 4% of today’s problem. 96% is literally organizations keep fetching known vulnerable versions of components that are already fixed. They don’t have to make that choice.
As of last week, 30% of the downloads from Maven Central of Apache log4j are of still those known vulnerable versions. 18 months later of the most high profile thing we’ve had in our industry, one third of the time, people are still choosing these bad components.
That’s a huge part of the problem. That comes back to organizations don’t really have an organizational understanding of those things. I hesitate to call that an SBOM because an SBOM to me means a little bit more like, the piece of paper around what’s in a given application, but it’s more like the organizational BOM. The visibility of all of the SBOMs would start to help solve that problem.
Because again, I can’t believe they’re choosing that on purpose. I think the fact that continues to happen is that it’s somewhere in a transitive dependency, in an application that nobody knows is there.
We talk about known vulnerable components. This is where SBOs are starting to open the hood of the car, so to speak, and allow people to see those components.
All of a sudden we’re having greater visibility across the industry to maybe help address that problem of developers or engineers using these older components or these vulnerable components.
Attackers have more time to understand the previous versions and potentially exploit them. What would be your take? How would we potentially fix this as a industry?
For what it’s worth, using the latest tag in place of a version is considered an anti-pattern in Maven land, which is the pattern for every other ecosystem. Let me just say that again to make sure it’s clear for the audience. Most of the other ecosystems like PiPy and NPM, unless you tell it not to, it’s going to fetch the latest version from the repo.
Maven works the other way. It’s going to fetch the version you’ve asked for, and it’s not going to try to grab the latest version of it.
There is a direct line between why we are seeing so many supply chain attacks in ecosystems like NPM and Python as opposed to Maven, that’s part of the reason.
If you imagine yourself as an attacker and you can social engineer your way into either somebody’s credentials or the project, and you can put a new malicious version of something in the repo, in an ecosystem where everybody prefers the default, you instantly have users. You instantly get to deliver your payload.
This is like the dependency confusion attacks that we saw a couple years ago where somebody puts a component that has the same name as an internal component, but a very high version number, those tools will grab that thing and deliver the payload.
Whereas on Maven if you tried to do the same thing, you put a new thing in the repo not many people are going to use it right away. It’s onesie, twosie. You have to convince them to do an upgrade, which buys more time for the community to figure out, wait, this isn’t on the release notes. This wasn’t official.
This default of ease of upgrade has actually created a new attack vectors that are harder to deal with.
Interesting perspective. That’s a good way of looking at it. I guess it really depends on not just the use case of the package manager or the language that you’re dealing with. There’s different advantages to both approaches. When we come back to it, you’re only as good as your last vulnerability scan.
I’m going to try this one on for size for you. I’ve never used this, but it just popped into my head.
Last week, CISA released their Secure by Design, Secure by Default. I’m going to assert that the way Maven works is Secure by Design and Secure by Default, as opposed to the other way around. Because what happens if you don’t do anything, you don’t grab that latest version. Whereas the other ecosystems, you have to do something to prevent yourself from grabbing the latest and greatest version.
Somebody might turn that around and say, yeah, but they’re automatically upgrading. While that’s true, there’s lots of tools that can also recognize and recommend a good version to update to. That’s one of the things we’re playing with the BOM Doctor capabilities. To be able to look at all things and make an intelligent recommendation of when to do an upgrade, because it may not be the best thing. In fact, it’s almost always not the best thing to upgrade to the latest version as soon as it hits.
When it comes to vulnerability attestation that something isn’t affecting a specific piece of software, where do you see the whole VEX, VDR world going?
I find myself on both sides of this argument. I guess it’s at least good enough for me to be able to recognize that. For a long time, we have been advocating that if you can, you should upgrade, because what we saw in early days was there’s a lot of pushback from developers when security would come and ask them to do something. They would find every reason in the book not to do it. I think largely because it was a security guy asking them to do it, not because it was a bad idea.
There’s a whole religious history there that you probably could have a whole podcast on.
We were trying to say ” We’re going to try to make it easier for you to make intelligent upgrades and make it easy on development, and that way you can be safe.” As an example, this means it’s becomes less important to prove you’re vulnerable or not, if you can just upgrade and take it off the table.
So that’s the, we want you to upgrade kind of mentality. But what we’ve seen , certainly since log4j, is we’ve seen almost this whipsaw of everybody driving towards, what some of us have been calling vulnerability Zero. Teams have over rotated and said, ” If there’s a vulnerability in the thing, stop the presses. Stop the entire world. You have to get a new version that has no vulnerabilities.”
While that’s a logical conclusion, taken to the extreme can be harmful. It can actually cause people to forget to do a scan. Because they know that’s going to stop the world. And I think VEX and VDR is an attempt to try to quantify or qualify, “Yes, we have this component. Yes, this component has this vulnerability and here’s the evidence of why it’s not applicable in our use case.”
I get this question a lot around why are components that have vulnerabilities still available in Maven Central. Why does that happen? Isn’t that like a supermarket continuing to sell tainted lettuce? My answer to that is no, actually, it’s more like a supermarket continuing to make peanuts and peanut butter available. Peanuts and peanut butter can kill a lot of people. But it’s also perfectly fine for a lot of other people.
Component vulnerabilities are more like peanuts, in that they can be an allergen to a susceptible individual, but taken out of that context, they’re perfectly fine. VEX and VDR is that attempt to be able to disclose that these things are here, like any food will say, “Manufactured in a facility with tree nuts”, so somebody can look at that and judge for themselves. I think that’s what we’re talking about.
Let the end user be able to make that interpretation on their own. But the SBOM as conceived was literally just the inventory. Rather than bloating that up, we’ve created these other schemas that can carry the other mitigating information.
As a software developer and being part of a company that develops software, all of these practices, creating SBOMs, dealing with inventory, updating to the latest components. You’re probably practicing what you preach here a bit. Can you dig into that a little bit?
Obviously everything we build gets scanned so we understand, the entire bill of materials for everything in our organization. When a new vulnerability appears, the project teams are immediately notified. We happen to use JIRA and the JIRA integration. So a high profile ticket goes directly to the teams that are affected.
We basically use our own tooling to skip the triage phase. It immediately notifies them, and they will deal with it.
Now, in terms of like proactively staying up to date, that’s where some of the algorithms that we have in BOM Doctor are really interesting, and we published that in last year’s Software Supply Chain Report (SCR) .
What the industry is generally afraid of when everybody says constantly update, is generally afraid of falling too far behind, to the point where it becomes really difficult to upgrade. But if you’re close enough to the modern versions and there’s no new features in those modern versions or bugs that apply and there are no new vulnerabilities, updating to that every time just creates more work. It’s more entropy. Something else might break.
What we use are the algorithms to put ’em into proactive and reactive mode. You want to stay in the proactive. So you’re close enough to the front that if something bad happens, you can easily grab the patch. Even if you’re a few versions behind, sometimes projects will backport.
Figuring out where that version is, is actually pretty difficult to do as an end user, but using data that we have from Maven Central and other things, we can actually understand on a component by component basis where the, we refer to it as the herd. Where is the herd. Which side of the river is the herd on? You don’t want to be at the front, but you don’t also want to be at the back. We can automatically calculate that for you and then use that to provide update recommendations.
We’re using those capabilities to manage our own software.
So this is actually good. It’s like knowledge is enabling. And when we talk about SBOMs as a vendor and you provide this to government agency, you’re putting a bit of a power in their hands. Are you giving up some of your own secret sauce by that?
That’s a common thing that we hear about. There was a talk about that at RSA from Josh Corman. I don’t put much stock into that. If you assume that the attackers don’t know what’s in your software because you don’t, or you don’t want to tell anybody, I think you’re wrong.
It’s a sad state when the attackers know what’s inside your software better than you do or better than your customers do. That’s where we are right now.
I don’t think there’s any secret sauce in disclosing to your competitors what frameworks you’re using, whether you use log4j or not. Some of these things are so popular, you can stipulate that it’s almost guaranteed to be on the class path.
Things like common’s collections and Apache log4j and Spring. You could guess. You don’t even have to analyze it. And worse if you’re a web property, all of those things are shipped down to the browser. So anybody who knows how to view source can figure out exactly what your infrastructure is built on.
The SBOM doesn’t really, in my opinion, provide new information to the attackers. But it does provide relevant information to the defenders. Some of the assertions are that, given this information, end users can’t do anything about it. To some extent that’s true if you’re talking about can they patch it themselves.
But knowing what vulnerabilities apply to which applications and which ones are being actively exploited certainly is useful from a network defense perspective. From a perimeter defense, you know what to be looking for, you know what rules to put in place. Having that information downstream is really helpful.
I do have a challenge though a bit with where the industry has been focused. This is largely due to industry’s desire to do the least possible, in response to the regulation. People who are saying, I only need to produce an SBOM because the executive order said so, I only need to emit this piece of paper and ship it to the government with the software because they said so, are missing the point.
I have a little bit of agita over that focus on producing and consuming SBOMs for that reason. For me, like I said before, the point was about managing it. You have to know what you have in order to manage it, but just knowing what you have and shipping that is no good.
The analogy. I used this when we were together in North Carolina, is imagine if we told the auto industry, you don’t have to do recalls anymore, but instead, every time you sell a car, you must print out a list of parts and put it in the glove box. That’s ridiculous. But that’s where some parts of the industry are focused on right now.
That puts all the onus on the end user. So if you bought that car, you’d have to constantly be checking if any of the parts in your car were recalled and figure out what to do about it. That’s a ridiculous situation.
And yet I’m not advocating that we hide it because having that information is in fact useful because then you can hold your suppliers accountable and you can do some perimeter defense, or at least know what to look for in these other cases.
Do you think that the industry’s going to shake out a little bit and we’ll end up figuring out how to really use these things? There’s the format and there’s the generation, but there’s also the consumption and the delivery.
The consumption and management of it is the thing we’ve been doing for basically forever. And yet it’s the thing the industry is largely ignoring at the moment because everybody’s so freaked out about producing SBOMs because the government said so.
I think those things are necessary, but not sufficient at the end of the day.
I do think though the focus is certainly from the national cyber strategy that was released, what back in March, with talking about shifting the focus on liability and focusing on the outcomes, I think is the right way to get us there. It focuses more on that outcome versus the mechanics of how you did it, Putting a bill of materials together that has a bunch of broken parts may be sufficient for meeting the letter of the law of the executive order, but if those things lead to bad outcomes, it’s not going to shield you from liability.
A lot of the talk in the regulation and policy circles is around defining the best practices because we understand that no software can be perfect. We’re not at that level in our industry yet where we can guarantee perfect functionality. But that, we should be able to understand that if you’re not following the best practices of the industry, you might be negligent. You’re failing to meet that level of due care.
That is ultimately what it’s going to take to get people to do the right thing here and not focus on just literally, I have to check this box.
What are your final recommendations about software bill materials for someone who’s looking at what do they do next or how do they even start?
I like to ask this table exercise whenever I give a talk. If I told you about a vulnerability right now that you hadn’t heard of before, how long would it take you to answer the question, “Are we anywhere in our organization even using this thing?” If you can answer that, can you answer which versions and in which applications.
This is basically what the world did on what, December 10th, 2021, when they found out about log4j. Everybody was like, are we using this? Where are we using it? Some companies still don’t know. If you can’t immediately answer those questions, you need to get to work and start figuring out how to do that. That’s going to be the same motions around producing the SBOM. You need to first understand it because you can’t produce it if you don’t know what’s inside it.
Tools exist. There’s lots of tools out there that can help you with this, but that is the minimum standard. Once you get beyond that, then you have the ability to start making really interesting choices around the dependencies. You can start to intentionally manage your supply chain and ask questions like, do we need to be using, this is a real example, 84 versions of Spring. We had a customer that were doing that. They’ve basically subjected themselves to all the possible vulnerabilities at that point. At the same time, do we need to have 15 different XML parsers in our organization?
These next order kind of questions can start to be visualized and solved once you have those collective bill of materials. That can lead to massive actual innovation and efficiencies within development. But if you don’t even know what you have to be able to play defense, you’re miles away from being able to get on offense and be a much more effective engineering shop.
This episode of daBOM was created by me, DJ Schleen, with help from sound engineer Pokie Huang and Executive Producer Mark Miller. This show is recorded in Golden, Colorado, and is part of Sourced Network Productions. We use Captivate.fm as our distribution platform and Descript for spoken text editing.
You can subscribe to daBom on your favorite podcast platform. We’re going to be releasing a new episode every Tuesday at 9:00 AM. I’ll see you next week as we continue to diffuse daBOM.
Brian is Chief Technology Officer at Sonatype. He has extensive open source experience as a member of the Apache Software Foundation and former Chair of the Apache Maven project. Brian was a direct contributor to the Maven ecosystem, including the maven-dependency-plugin and maven-enforcer-plugin. He has over 20 years of experience driving the vision behind, as well as developing and leading the development of software for organizations ranging from startups to large enterprises. Brian is a frequent speaker at national and regional events including Java User Groups and other development related conferences.