The Connection between Generative AI, LLM and SBOMs
I’ll never forget the day I met Tracy, although I really think we were actually separated at birth. We were scheduled to be on a podcast together and after introducing ourselves to each other in the call lobby, we began a discussion that most likely would’ve gone on forever at the host, not interrupted us to get the show started.
It turns out we both have similar passions in the DevOps, DevSecOps, and SRE spaces, and not just philosophical ideas and hoopla high fives. We’ve actually done it. Practical implementation of ideas that have injected security into the software we all develop.
An architect, a programmer, a dreamer, and a visionary, she’s also a strong advocate for diversity and inclusion in the technology industry, and has often shared her experiences about being a woman in technology.
Two topics that are very close to my heart as well…
Earlier this year, Tracy and I were brought together by Mark Miller for “It’s 5:05”, a podcast produced by The Sourced Network that brings snack sized news about open source and security topics to the masses on a daily basis.
From the seeds of “It’s 5:05” came the opportunity for me to create this podcast. And also for Tracy to create a podcast called “Real Technologists”. And if you haven’t heard it, you need to. It’s a brilliantly done production about the people “behind the technology”.
And speaking of real technologists, Tracy is one of them.
Welcome back, to daBOM.
Hey, welcome back to daBOM. I’m here with Tracy Bannon, one of my fellow Sourced Network Podcasters. Tracy, how are you doing today?
Oh, I am doing wonderful, highly caffeinated on this Saturday morning. Ready to jam with you man.
I just got some caffeine into me as well. Tell me a bit about yourself, Tracy.
Oh, gosh, where to start? From a technology perspective, my background, I’m a software architect. I’m a software engineer. I’ve been doing this for a couple of decades. Right now I’m focused on the federal government and really on the DoD.
Got into DevSecOps, not by choice, because I needed to get stuff done. I needed to get my software into production and applying Continuous, Integration, Continuous Delivery, which was just part of the mix. I just had to do it.
In my spare time I like to ride my Peloton. I ride a road bike. I, let’s see, I garden, I read I’m a nerd. So all those types of things, they’re a little bit of both.
And you’re a programmer as well. What’s your favorite language?
I’ve been playing with React. It’s not my favorite, but I’ve been playing with it to make sure that I understand what the front end folks are really going through and how they’re working with NoSQL backends.
You also said DevSecOps. How are you seeing SBOMs being used in that whole DevSecOps process and community?
Oh, you’re going to ask me to open the kimono on what’s going on with DevSecOps and the SBOM? When it comes to software, we’ve been doing dependency management, I’m going to use that term for right now. We’ve been doing dependency management for decades, and when somebody fails to do decent dependency management, when there’s a problem, they get into a predicament.
SBOM happened to be an industry-wide embodiment of what we’re doing with dependency management, especially since we had the explosion of open source.
When I think about doing development today versus doing development even 10 or 15 years ago, the sheer number of packages that are being brought down, the sheer number of external things that are being deposited into my ecosystem is mind boggling.
When it comes to applying SBOM into DevSecOps pipelines, we’re still figuring out where’s the right place and the right trigger, because some of these SBOMs, they’re non-standard in what’s contained in them. Where am I getting them from? How am I securing them? Am I generating my own ?
So there’s both the consumption of these things as well as the scanning and evaluation and applying it. People want to use them as a trigger. That’s great. Is that necessarily indigenous to all DevSecOps pipelines right now? Nope. Nope. Because we’re all trying to figure it out together at this point.
It’s messy to say the least.
You know we’re trying to retrofit this whole concept of, hey, we’re going to collect this information. We’re going to just store it somewhere, anybody can see it and maybe we might publish it out.
That’s why people are getting scared. Industry got scared, especially in the government space, when the executive order originally came out. Wait, you’re going to see my trade secrets. I don’t want you to know what I’m really doing under the covers. I don’t want you to know that I took DJ’s open source and I put this really cool thin layer on top of it and made it sexy and awesome and sold it to you.
But they also got nervous. Can you secure my SBOM? If I give you my SBOM, if I give you my secret sauce, are you able to secure it?
And a third piece that came out with they’re really fearful was, you don’t know how to use it yet. You’re asking for it. I know it’s going into this repo over here. It’s going to this S3 bucket, or wherever you’re throwing it. You don’t know how to use it yet.
People were actually thinking about, oh, I’m going to have somebody open it up and look at it. Seriously? A lot of real pushback has been happening in the government space because vendors are fearful of those things. There’s merit to what they’re worried about.
You mentioned the government. I know you’re involved with the military and you probably can’t divulge a lot of the information that’s going on, but what kind of problems are you seeing with the adoption of software built materials in the government, and especially the military space?
I’m going to take it a step back. There’s a precursor problem to that, which is getting to DevSecOps, being able to apply those principles. Take it a step back from that, being able to apply agility, lowercase, agile.
It’s a lot of Brownfield. The software is not all net new. We talk about Greenfield as the all new things that we’re doing. There’s a lot of Brownfield and a lot of times we’re knitting those things together. I call it olive, right? We got a lot of olive development.
With decades of software with hundreds of pipelines, not all of them having all of those automations across it, we have a tremendous mountain to climb. We have to pick and choose which things get automated when they get automated. What’s my risk that I have to address on this? Is it a quality risk?
If I’m dealing with a war system, with a weapon system, that’s different than military has a lot of business systems. They have ERPs. How I have to treat those things is very different.
Part of it is the complexity and the speed at which things are now changing. So now we are adding green and we’re bolting it on. How do we get to SBOM if I can’t get decomposed enough that I understand all of the modules that are within my software itself. We do have a lot of monolithic software that was out there that was created. It runs effectively. It does what it needs to do. But try and put that steel girder through a wood chipper. I dare you.
So how are you implementing SBOMs or going into that evangelism with the military.
Part of it right now actually is helping them with contract language. Now that’s not me directly writing contract language, but being the connector of connector.
For example, I work with Mitre and they’ve got experts on government contracting language. So the first step is getting it into the contracts.
Remember that software in the government spaces, they call it being acquired. And under acquisition, it’s everything from building it to buying it directly, to working together with industry to create it. Any software is acquired.
If it’s acquired, it has to have a contract, it has to have appropriations against it. In order to do these things, you gotta backtrack and make sure that the budget is there and make sure that the contract is in place. Because if the contract doesn’t say you have to do it, you don’t have to do it.
So that’s another part of it. It’s going to take years for the uptake because it’s not in the contract.
That becomes a piece of renegotiating contracts. This ties back to DevSecOps and agility. If a contract is a three or four year project, old school like we used to do beginning and an end, and I deliver something at the end of it, I need to now renegotiate that to say, can you deliver a minimum viable product and then start to provide additional capabilities as we go, adding in DevSecOps. Making sure they are applying agile mindset at a minimum, getting towards minimum CD.
There’s just so many complex parts with this. It’s an entire mind map. It really is. If the architecture doesn’t support it, then we have a problem. Do we need to refactor? Maybe. Do we need to, more than refactor, do we need to start to deprecate? Do we need to start to strangle that older piece out by building new pieces around the edges and eventually completely replacing,
The strangler pattern in architecture, it’s actually not grabbing onto somebody’s neck and strangling them. It is from the strangler pattern. It grows around the outside and eventually kills what’s on the inside.
So what do we do with the existing software? What if it is mission critical? Can we actually do that? Can we incrementally kill it from the outside by putting new pieces on it? It’s complex. It’s not as easy as we want it to be.
Sometimes the broader commercial industry doesn’t take a step back to consider those pieces. I read an article this morning by Ford’s CEO, talking about the fact that they’re 20 years behind Tesla. Not because they haven’t been passionate about it, but because they have over 150 different components, each written by different manufacturers with different languages and that’s how they’ve been doing it for 20 years. Where Tesla snapped a chalk line and said, we’re going forward together. This is the singular language, this is the tool sets that we’re using.
That same problem is what we’re seeing with government. We’ve done this for a long time. We’ve cobbled things together or we’ve brought things together and integrated them somehow and knitted them together, but it hasn’t been clean and it hasn’t been with those divided lines that we need.
There’s just a lot to this. I don’t mean to sound like the world is ending and this is a terrible thing. It’s a big, awesome, complex problem to solve.
What’s the motivation for these public organizations to adopt SBOMs? Is it what’s coming outta CISA right now? Because CISA says, hey, you know what, this is a guidance. We can’t enforce anything.
The executive order that came out, the NIST standards that come out, the CISA guidance that’s there, those are all supporting, but those are not really what the motivation is.
The government, believe it or not, really wants what’s best for us. They want software that is secure and they want it delivered rapidly. They realize that software is never done and that’s different. Moving from project to product is a very different economy. It’s a different way of doing things.
The government sees that, and you’ve got. layers and layers, I call ’em scabs. All these scabs that have been built on top of each other and scars from where something went wrong and a checklist was created and a policy was built on top of it because something went wrong. We’ve got so massively risk avoidant, that also makes it difficult.
At the end of the day, people want to do the right thing. They want to make sure that we are ahead of our near peer adversaries. That the software that we’re building is using secure components. That we understand that maintainer or that contributor is a vetted individual that hasn’t made nefarious updates to other software packages.
People really do want what’s best and right and secure for the US.
That actually brings up something that’s missing in a Software for Bill of Materials, which is that origin, the national origin of a component. We know that things are like, oh, package Log4j, but we don’t know that the underlying contributions to that open source product are 95% from an embargoed country.
They’re starting to track that. That is becoming part of the enhancements that they’re doing. I think there’s another formal word for it, when they’re taking the SBOM and they’re enriching it and adding additional data to it. So that is part of the mix of what’s happening with that.
So we’ve just made our SBOMs 10 times bigger, unfortunately.
DJ, if you can gimme a skinny SBOM that I can then enhance, if we can make sure that we’re passing small messages, if I can pass a small message and then I can enhance it to be whatever I need it to be, I’m sure that there are multiple ways for us to get after this.
I keep thinking about how much people are frustrated with SaaS right now. They have a love hate with SaaS, right? We all hate the fact that when we want to use software, we’re now lumped into forced subscriptions. I want to buy it, I want to spend my a hundred bucks and then I want to use it for the next five years and not have to worry about it.
People in the government have, in some places, embraced it because you get fast access to quality products that are ready to go. But when we start to knit all this together, when I think about what could be for the SBOM, in my mind are a series of services that can be offered, that could provide the enriching.
For example, what if the SBOMs are being transmitted to whoever they need to be transmitted to, whether it’s coming to me on my direct repo, and then I am subscribing to enrich it. There’s just an entire matrix that I can see in my mind’s eye about how we could connect these things.
There need to be more people focused on how we process the SBOM. Right now, we’re just all hell bent, we’re going to create them. Everybody’s creating them.
That’s great. And how am I transmitting it? How am I securing what I’m transmitting? How am I using, how am I truly leveraging it? How am I tying it into, whether it’s into my pipeline or whether it’s into additional audit and attestations? How am I doing that?
I think those are the pieces where we need to get after those SaaS offerings, and I think that the vendors need to be thinking about that.
Targeting those specific enrichments I think is going to be something that we’re going to see as that whole marketplace opens up for SBOM related tools.
I have a question for you, DJ.
Do you see people when they are getting risk scores, do you see them reacting to all of them, or do you see people being pragmatic and ranking them and saying, these are the ones that are really important? I’m seeing “the sky is falling, Chicken Little reaction” that there can be no flaws, no defects, no security vulnerabilities at all.
Quite frankly, I believe we’re going to have to start to prioritize them a little bit more. But are you seeing people do that when you’re interacting with them?
I see the organization I’m working for doing that.
Think about this. We know these CVEs, we know severity. We know things like EPSS score and exploitability variables. What we have to get to is distilling those. Hey, we got a hundred thousand vulnerabilities. Oh, great. Yes. They have to all be fixed. Let’s talk reality. And you’re right there with the whole architectural realities conversation.
The idea is if we understand the vulnerabilities on a mass scale and we can prioritize them, that might be a fluid prioritization based on, hey, today, now there’s an active exploit. It’s not a moment in time, but it’s information that we can use to say, is this important enough where I have to do an all hands on deck, or is there an SLA that I can let the developers deal with before I even ticket them and distract them with something else.
There’s a whole bunch of conceptual ideas that come out of having this prioritization. But the first thing is knowing what you have. That’s where the SBOMs come into play from our vendors and collecting ’em internally, aggregating them, using something like GUAC to query that, to understand what we have.
That’s a good question.
I love to ask the questions. You and I like to banter back and forth and ask questions.
Something that is near and dear to my heart as an architect and an engineer is that the SBOM is, evidence of what’s been built. I like to go back to being Secure by Design, Secure by Default, Secure in Depth. It needs to be an indigenous part of everything that we do.
Security’s not bolted on, everybody says that, but perhaps not everybody knows what that means. It seems as though it’s become trendy. We’ve now suddenly started saying Dev-SecOps, because we have to highlight that somehow now we’re thinking about security. I’ve been working in the government space. It started out working with integrated eligibility and child welfare. And let me tell you what you want to know about security and being concerned about that. That’s two decades ago.
I build it in and I want others to be building it in and getting after the SSDF, the Secure Software Development Framework. That’s actually an excellent publication. I don’t push publications too often, but it’s the NIST 800-218. It gives you ideas. It gives you the what and the why, not the how.
If you can bring your organization together, your architecture engineers, your developers, your testers, your business or mission owners, and really think about what is your Software Secure Development Framework look like? What does your SSDF look like? That is going to help us to build a better SBOM later. It’s going to help us to build a better product. It’s going to help us to be modular, it’s going to help us to be decomposed, loosely coupled, all of those things that we need it to be. Because an SBOM against a bucket of mud that has a lot of occluded things, is not going to be as helpful, as when I can actually take fast action.
The architecture needs to support being able to take fast action. If I have that big ball of mud or bucket of mud, I can’t take fast action to get after dealing with that security vulnerability.
We walked around RSA looking for Secure by Design, and I think, we could have a whole podcast on that because we didn’t see it anywhere. I didn’t even see any like SBOM stuff anywhere.
I saw one Secure by Design and it was CISA. I took a picture by their booth to prove that somebody somewhere. I did in fact take a Secure by Design sticker from them.
You are very passionate about AI and ML. We had some discussions about this at RSA this year. One of the things that CycloneDX 1.5 has come out with was ML-BOMs.
I’ve had conversations with folks and they’re like, that’s just ridiculous. We’re going too far with this. But if you’ve read anything about poisoning a model and the provenance of all the training data and the tweaking that you can do and the tuning of these models, it starts to make sense.
Have you seen any conversation in the things that you’ve read or looked at about ML-BOMs?
Not as much about ML-BOMs as I’ve been involved with conversations about a Data-BOM. Honest to goodness, a Data-BOM. It’s essentially what we used to call a data card. It’s the metadata about the data showing the lineage of it. When we’re looking at moving towards data mesh, ubiquitous access to data from a producer, consumer perspective, the Data-BOMs come into play.
I haven’t really seen much around the ML-BOM concept. What concerns me about generative AI overall is that we’re losing lineage, completely losing the lineage. If we think about these tools, I don’t just mean ChatGPT. You can sit that in the corner for now and talk about the other models that are being creative.
Microsoft right now is training another model with many, many fewer parameters than what they did with OpenAI and ChatGPT, focused on language. I think believe it’s focused on Python. But they’re training it using open source libraries.
If I wanted to be nefarious, if I wanted to be a bad guy, the first thing I’m going to do is I’m going to make sure that my open source has a whole lot of bad stuff in it, and I want to make sure that I am poisoning your model so that you’re getting bad code generated.
That provenance is not necessarily there. If you want an example of being able to see, history of where a model gets its information, you could look at Perplexity. I think it’s perplexity.ai or.io. When you put a question in, put it into ChatGPT and put it into Perplexity, at least Perplexity will tell me with citations where it got that bit of information.
That is amazing! Because I check it and I will look at it and say, that is a really bad source. Regenerate, I want something that doesn’t have that crap source in it.
I am concerned that the more that we are trying, and I going to emphasize that trying to generate more code, the more of a pickle I believe that we’re going to be in. It is not ready for prime time. As of the end of April, I think it was like 29% of the generated code was actually being accepted by the developers who were looking at it.
We’re going to see more and more people accepting generated code. That gets me nervous because the junior developers and new in career folks don’t have the experience to say what’s right or what’s wrong yet. So when they’re given five options and they can click through the five options and pick one, they probably go with the first one that came up, or the one that has the niftiest flow through the software and not necessarily pick something that is the most secure.
We have a long way to go when it comes to applying generative AI to the SDLC. I know that’s not what you asked, but that’s where my mind directly goes because that’s going to influence the security of everything that we’re doing. It’s going to influence it top to bottom, left to right, all parts of it. It’s going to infuse itself.
One of the things I’ve found AI tools useful for is say, hey, can you simplify this and not use this component, but use the standard library? And the cool thing about that is now you’re getting rid of some of the supply chain and you’re simplifying things with not even a known component, but no component.
It might make your code more complicated, but then you can just say, refactor this to reduce the systematic complexity, and away you go.
You are experienced enough to do that.
That’s where you have to be careful about using these things, especially with the downstream effects.
We have compensated controls to catch vulnerabilities. You had mentioned in the supply chain about Python and training on open source Python. A lot of POCs for vulnerabilities are written in guess what? Python. Those are sitting there open in GitHub and there’s things in the Read Me that says, hey, don’t use this.
What if you’re training the model on these POCs and now all of a sudden you’re saying, hey, you control an AI with prompts. But it could be dangerous. Again, I think we’re at the infancy. We’re at like iPhone 3 maybe, or iOS 1 with that.
In the early days, I truly believe, I jokingly say this, but I truly believe in years to come, we’re going to turn around and instead saying, like our parents did, where were you when they shot JFK? We’re going to be like, What were you doing, where were you in your career when Generative AI really exploded on the scene?
I think we’re actually going to take pause and look back at that and realize we are in its infancy, but it’s moving fast. It is going to cause great disruption. It’s causing great disruption. We need to get ahead of it and harness it.
I’m not anti, in any stretch of imagination. I’m always pragmatic about this. We can do great things with it, and we have to be aware that the bad guy can do great things with it too.
We used to talk about script kiddies. You want a script kiddy? Before at least they had to learn something. Now they just toss it into the local Generative AI tool and away they go.
So this is where, Software Billing of Materials for vendors is going to come into play.
For example, I just had a scenario where looking at a product does some great work and take a look at the back end, and we’re like, Oh! You’re sending data to ChatGPT OpenAI, and we have policies around that say we can’t do that with certain data of different classifications.
Now, all of a sudden, this is a fourth party issue, so an SBOM that says, hey, you’re using this library to turn the lights on. Especially for organizations that are doing a third party security review and maybe don’t understand the technical impact of some of these things.
That’s going to be a really important part of it. I think SBOM is going to play a very important role in that situation, especially.
Understanding our open source libraries is one thing, but being able to call out directly that there are third party sources that are being referenced, that there are APIs that are being invoked, absolutely a part of this, and people are unaware, quite frankly. There are multiple corporations that have banned their employees from using some of these generative tools, and for good reasons. There have been security leaks by people posting proprietary code into these tools and asking it like you have, can you simplify this? Can you explain it to me? Is another thing that happens.
Don’t understand the code, can you explain it to me? I’m finding there’s a really interesting use on the horizon, which is modernization. How do I take that old code that might not have been documented correctly and figure out what it does?
There’s some really interesting scenarios that are coming around about that.
Where do you see things going in this SBOM space? What are the challenges being an architect dealing with this, and any advice for security architects out there?
Across the board, it does go back to the modularity and the decoupling. If we’re talking about net new architectures or the refactoring of existing architectures, we have to be able to identify pieces that we can fix, independent pieces that we can fix, and test and monitor and replace quickly.
I do believe that this all tracks back to getting after decoupled architectures, making sure that everything is Secure by Design, that you’re applying your tests throughout and that your security folks are sitting beside you and that they understand the trades and that you’re working together.
Security is something I have to be concerned about. But I also need to know that my security pro is collaborating with me because we’re not all going to be generalists. We can’t dump all of the responsibility for security onto development teams because they don’t have all the training and they don’t have all the time to be the security experts at the same time.
So I do believe that the second part of this, Secure by Design and make sure that you guys are collaborating, that we’re sitting side by side.
This episode of daBOM was created by me, DJ Schleen, with help from sound engineer Pokie Huang and Executive Producer Mark Miller. This show is recorded in Golden, Colorado, and is part of Sourced Network Productions. We use Captivate.fm as our distribution platform and Descript for spoken text editing.
You can subscribe to daBom on your favorite podcast platform. We’re going to be releasing a new episode every Tuesday at 9:00 AM. I’ll see you next week as we continue to diffuse daBOM.
Passionate Architect!!! Tracy (Trac) Bannon is a Senior Principal in MITRE Corporation’s Advanced Software Innovation Center. She is an accomplished software architect, engineer, and DevSecOps advisor having worked across commercial and government clients. She thrives on understanding complex problems and working to deliver mission/business value at the speed.
Trac walks the walk and talks the talk. She’s passionate about mentoring and training and enjoys community and knowledge building with teams, clients and the next generation. Helping others to contribute content is another focus including building out “guidance as code” as a contributor and maintainer for MinimumCD.org.
Trac guest lectures at universities, leads working groups, and shares experience stories. She is a long-time advocate for diversity in technology, helping to narrow the gaps as a mentor, sponsor, volunteer, and friend. Ms. Bannon is an accomplished blogger, podcaster, author, panel moderator, and featured industry speaker.
She holds certifications from Microsoft , AWS, DevOps Institute, PMI, Scrum Alliances, and the Software Engineering Institute (SEI).