Back
#52
February 14, 2022

EP52 Securing AI with DeepMind CISO

Guest:

27:27

Topics covered:

  • We spend a lot of time on Artificial Intelligence (AI) safety, but what about security?
  • What are some of the useful frameworks for thinking about AI security?
  • What is different about securing AI vs securing another data-intensive, complex, enterprise application?
  • What do we know about threat modeling for AI applications?
  • What attacks against AI systems do we expect to see first in real life?
  • What issues with AI security should we expect to face in 3-5 years?

Do you have something cool to share? Some questions? Let us know:

Transcript

>> Tim: Hi there. Welcome to Cloud Security Podcast by Google. Thanks for joining us today. Your hosts here are myself, Tim Peacock, the Product Manager for Threat Detection here at Google Cloud, and Anton Chuvakin, a reformed analyst and esteemed member of the cloud security team here at Google. You can find and subscribe to this podcast wherever you get your podcasts, as well as at our website, cloud.withgoogle.com/cloudsecurity/podcast. If you like our content and want it delivered to you piping hot every Monday, just in time for taking your dog for a walk, please do hit subscribe button. You can follow the show and argue with your hosts on Twitter as well. twitter.com/cloudsecpodcast. Anton, we are talking about artificial intelligence today.

>> Anton: And, you know, today we are talking about AI without making a single snide remark. And you know why?

>> Tim: That's because we were both afraid to.

>> Anton: No, no, no, no, no. It's because--

>> Tim: No.

>> Anton: It's because we are talking about that place that almost nobody questions has AI, DeepMind.

>> Tim: They definitely have AI.

>> Anton: They definitely have AI. That's the thing. It's like when you read the media stories about the achievements of DeepMind Technologies, you sort of realize you're not in a snide remark territory. You are in a, your mind blown territory.

>> Tim: Yeah. And I think today's conversation was really interesting. Although we didn't get deep into AI itself, we had a really deep conversation about the risks around AI, specifically the security risks, leaving aside the social privacy risks.

>> Anton: That's correct. That was a very conscious decision. And we also wanted to explore how DeepMind approaches and thinks about securing AI. So to me, this was one of the topics I felt like I left my previous job as an analyst without truly answering. And I know that it's been bugging in my mind. How do you secure big, complicated artificial intelligence applications? What are the risks? What are the threat models? What are the approaches? This is really tricky and it turns out DeepMind has a lot of fun stuff to share on this.

>> Tim: And I think those answers are relevant, not just for understanding DeepMind, but for anybody thinking about AI and ML products and the risks associated. So with that, let's turn it over to today's guest. Listeners, I am delighted to welcome today's guest. We're joined today by Vijay Bolina, the Chief Information Security Officer here at DeepMind. Vijay, super stoked to have you on the show. We've been looking forward to this one quite a bit. I want to kick off by asking a relatively straightforward question. You know, we spend a lot of time thinking about AI safety, but what about security? Is there a useful framework for thinking about AI security as a discipline?

>> Vijay: Yeah. So great question. As you can imagine, there's a lot of discussion going on right now in this space in particular. There's a few things that I could probably point to and rattle off the top of my head, but there's an ISO working group that both Google and DeepMind are actually a part of for the working group of AI standards and advisory board. And they're working on standards that will define effectively guidance for everything from risk management to assessing the robustness of machine learning systems. So that's quite interesting. And I believe it's still in draft mode right now. MITRE is partnering with quite a few organizations and they released their first iteration of something they're calling ATLAS, which is Adversarial Threat Landscape for Artificial Intelligence Systems. Which is an attack-style framework so that security practitioners can orient themselves to the emerging threats that we're seeing against machine learning systems in a traditional attack-style way, which is kind of cool. The Berryville Institute of Machine Learning also released a taxonomy for model manipulation and extraction attacks, which is pretty interesting as well. And in addition to that, they also kind of released something, what they're calling a risk assessment framework, which helps you as a practitioner to build security into machine learning systems from a traditional security engineering perspective, which is also really cool. I think last but not least is the Microsoft Trustworthy Machine Learning team's ML Risk Assessment Framework, which is quite cool as well that came out within the past year or so. I think those are all really good frameworks that have been kind of put out there that allow us all to start to think about what AI security actually means. And it's all pretty exciting.

>> Tim: That's awesome to see that there's not one, not two, but a whole list you've been able to rattle off of where people are working on this problem at a high level. That's awesome.

>> Anton: But to me, this is how we can verge, right? Because right now, when--in my Gartner days we sort of a little bit looked at this problem a little bit and we realized that even the language doesn't match. So people can't even have a intelligent conversation about this because I say AI security and somebody hears privacy, or they hear risks from malicious AI applications rather than attacks. So to me, this is fascinating that--and I think we do need multiple frameworks before we have a coherent language to present it.

>> Vijay: Yeah. Yeah, absolutely. I think there's a lot of discussion around overlapping terminology being used in this space. And I think there is some overlap, right? Whether you say trustworthiness, whether you say robustness, whether you say safety, whether you say security or whether you say privacy, right? They mean different things to different people, but there is some overlap between all of it.

>> Anton: And just to start going further in this, one other theme that emerged back in my analyst days is that some people looked at AI as a complex data-intensive application pipeline, and they didn't really have the sci-fi vision of AI. They just thought of AI as well, AI today is a big complex data-intensive application, probably running in the cloud. So they applied advice built for such applications like CRM or ERP or something. And they said, oh, this is our AI security advice. It's the same. So what's your thinking here? What's different about securing AI versus just securing complex data-intensive enterprise app?

>> Vijay: Yeah. This is a great question. And you'll probably get varying answers depending on who you ask. I think there are similarities, there are differences, but I think at a high level, the attacks that you see against the machine learning system can be vastly different from the ones that you see against these traditional enterprise applications that you and I are used to seeing. Well, everyone's really used to seeing. So I think what it boils down to is you have to slightly think about what it is that you're protecting or effectively threat modeling quite differently. A good example of one difference is that you typically don't have to worry about the business logic of a web application being stolen in a traditional enterprise application. But deployed in a machine learning system, you may have to worry about model stealing or extraction attacks as an example, right? And so similarly, in an online model, you may have to worry about adversarial examples that can sway its decisioning versus if you had a traditional enterprise application with static business rules of some sort, which you wouldn't have to. And with that said, you know, attacks can be similar too. So, for example, if you had a WAF protecting your web application from Log4J exploits by using static signatures, and that can be attacked by iterating through an analyst set of permutations to bypass that WAF. And if you had a similar system, but it was a deployed offline classifier as an example that was trained on a static data set, an attacker can too also create interesting inputs to bypass that model that was identified--that was deployed to identify certain types of content. So I think the gist or the TLDR really is there are some similarities, but there are also some differences and it highlights the importance of having the safety and security practitioners part of the design and architecture of a machine learning system very early from the start to cover things like the threat modeling and algorithmic defenses that you may wanna consider when you build these things.

>> Tim: That's super interesting. And you mentioned a couple of threats in there, model stealing, adversarial inputs. What do we know about really threat modeling for AI? 'Cause this is an area I haven't thought a lot about, and it's kinda shameful that I haven't, but I'm curious, like, what's the state of the art there.

>> Vijay: A lot of discussion in this space and a lot of good guidance has come out. But I think first and foremost, we have to remember that you can't just forget about traditional threat modeling, 'cause that's also still extremely important. At the end of the day, these things are deployed on traditional infrastructure in a lot of instances, right? And so you have to address those standard class of attacks that we have grown some maturity around over the past few years. And if you don't think about those types of things, you're by default gonna enable anything that can be focused on your machine learning system. And the other thing to kind of keep in mind from the threat modeling standpoint is the skill sets that you have in data scientists and researchers and security engineers some of them overlap. And, you know, I kind of alluded to this a little earlier. So you need to ensure you're asking the right questions when you're thinking about the security of AI systems. For example, like, trust boundaries will be different. You have to assume that there will be poisoning of the data that you train from. There's gonna be considerations around, well, maybe the data provider itself can be compromised. And--

>> Anton: So it's like AI supply chain attack, almost like attack against data supply chain doing it, that's kind of cool, actually.

>> Vijay: Data's massive right now, right, when it comes to machine learning systems and so it's important to kind of think about what can happen if you want to call it the supply chain. That's one way to look at it, but you're gonna wanna be able to detect when anomalous and malicious data entries do occur and be able to distinguish between what is bad, what is good, and be able to recover from them. Things like tracking providence and lineage of that data is gonna be super important as well, too, when it comes to your system and the underlying pipeline and avoiding a situation where you have garbage in and garbage out from a training cycle standpoint. Machine learning systems are making more decisions and they're making them a lot more quicker. So it's extremely important that you identify and assess the action your models or your products and services could take, which can cause customer harm, whether it's online or on the physical side of things as well too. The other thing to consider when threat modeling an AI system I think is that threat vectors or attacks are different against AI and machine learning systems. And they require a different set of mitigations oftentimes too. And a learned approach is extremely important too when it comes to defenses. And so adversarial examples and those types of attacks or an attacker is stealthily modifying an input to get a desired output or a decision from a deployed model or from a data poisoning standpoint.

>> Anton: So if I'm trying to summarize some of the threat modeling discussion, I kind of sense maybe two-thirds of threat modeling would be quite alien to somebody who is used to doing it for a big complex app. And one-third would be kind of matching the traditional threats against the complex enterprise data-intensive applications. Is this a random guess or is this close to the reality? Like two-thirds would be quite alien is a lot though.

>> Vijay: Sure. You know, really, I think it depends on the type of system, whether it's a classifier, whether it's some other type of system. Your threat model will vary, the application itself will cause you to think about different types of scenarios from an attacker standpoint, as an example, that may change that percentage a little bit. But roughly I would say sure, why not? Yeah.

>> Tim: So Vijay, I wanna ask a kind of sideways question here, and I'm curious about your thoughts. Is it easier or harder to defend AI systems compared to traditional systems? So there are more layers of defense involved, fewer, more attack surface, less attack surface, or is it just different than defending a normal system?

>> Vijay: I think in summary, I would say that they're different. I definitely don't think that they are easier. I think that depending on what the underlying machine learning system is doing, the types of attacks that you'll see against it will vary. I think it just really depends on the application of what it is that has been deployed to really understand the types of things that you may see against it. Right? So this could be the difference between a web application that sits online that's accessible by the general public versus maybe an API endpoint that is tightly protected in some way, shape or form, but could still see some exposure from an attack surface standpoint.

>> Anton: Actually, that does make sense to me. And I guess to me, I would summarize that it's different and sort of easier in some areas and probably harder in other areas. We're not gonna see a self-defending AI for a good number of years, I'm guessing. But I can see how some of the challenges, some of the lessons we learned will in fact apply to securing AI.

>> Vijay: Yeah. Yeah. I think there's a lot of thought that goes into a lot of the lessons learned that we have had over the past several years when it comes to traditional security engineering can and is being applied to these emerging systems that are able to do significant amount of decisioning and have these applications that are vastly different from what we're used to seeing.

>> Anton: Perfect. And now the scary question, attacks. So hypothetically, what attacks against AI systems do we expect to see first? Of course, we are seeing some and there are even attempts to catalog them, but give us a bit of a lay of the land here, if possible.

>> Vijay: Yeah, absolutely. I think it's safe to say at this point that real-world attacks are actually happening. As you mentioned, they are being cataloged and referenced, and discussed quite openly. The Partnership on AI actually has a data database of AI incidents that you could reference and see an assortment of things that are happening in the real world. But what I think that we'll definitely be seeing more of are real-world attacks or bypasses against content moderation systems that have been built to identify harmful or synthetic content online. I think we see a lot of interesting deployment approaches to some of this right now. And there's a lot of good referenceable incidents too when it comes to some of these bypasses and some of the things that you see across large platforms and things like that. I think we'll also continue to see abusive systems that identify and attempt to prevent financial fraud in large banking or financial and e-commerce systems as well. I think anytime you see an opportunity to have financial gain, there's gonna be some type of abuse as you can imagine. And yeah, so I think those are gonna probably definitely be the prominent ones that we're gonna continue to hear about here.

>> Tim: That makes a ton of sense. And I love hearing about what we're seeing today, but I always love asking our guests to predict the future. So what issues do you think we'll see in the next, say three to five years if that's what we're seeing already?

>> Vijay: Yeah. That's a great question. You know, I think that the general issue that I see unfolding in the next few years is that we're gonna have a lot more of these AI systems coming online. The amount of people that are building these systems vastly outnumber the people, the talent required to secure them.

>> Anton: Of which there are roughly five on the planet, as far as I understand. Right? Like real AI security experts.

>> Tim: Yeah. You and four of your friends, right?

>> Vijay: Yeah, it's interesting. And I think there's a lot of examples from historical events that we could probably point back to. This is similar to when businesses went online during the dot com and had to deal with an assortment of vulnerabilities due to everything being online or when businesses flocked to the cloud and dealt with common misconfiguration issues like leaving your storage buckets world-readable or something along those lines. And I think just generally when we go and have these big pushes to a new class of technology, we tend to think about security later. And that tends to be a reoccurring pattern, unfortunately. And I don't think in the next few years as more and more systems come online, when it comes to these specific types of systems, it's gonna be any different. And I think that's why it's extremely important that some of the things that I mentioned earlier that we really try to understand in threat model what we're actually building and how you can actually defend it.

>> Anton: So that makes sense. And I think that the threats and changes due to the proliferation of the actual technology is a, unfortunately, very exciting time for security professionals, right? It's almost like first few years of cloud, first few years of online access. First few years of mobile, these were really fun years if you're a defender and I mean fun. And as somebody said the type three fun, right? Scary, fun that you don't wanna tell your friends about. So I guess we'll be around to see that. That's good. And I think that will be fascinating. So I wanted to switch gears a little bit and shift this to maybe a little bit more practical matters for a few. At the end of the episode, we typically ask two questions about advice for securing AI in their environments. Like give us one or a small number of tips people can take today to secure AI if they're using AI type or ML type systems. And of course the second question Tim would ask would be about favorite reading on this. Let's start with the practical advice.

>> Vijay: Yeah, absolutely. I think there's some things that I pointed to a little bit earlier in the discussion, but I think at a high level, it'd be extremely helpful to think about two big areas that you should cover. And the first is that you should think deeply about what it means to secure the pipeline that your model uses. And this is again, really about how you get that data and how you prepare that training data. And that's if you even use it and how your model learns from it and what types of infrastructure you have to allow that. And then how your model is actually taken from that training phase to the deployment once it's actually trained, right? So there's a lot to think about there. The nicety about this specific area is a lot of what we have grown to do well from a security engineering standpoint or traditional security engineering standpoint from core principal standpoint can be applied to practically securing the pipeline that your model uses. I think the second big area is, of course, securing the model and any potential interface that it has itself. Right. And so if you think about those two things of the interface and the model itself, the interface you would wanna protect in the traditional approach that you would do with any potential interface that you're building to traditional security controls, like authentication authorization, rate limiting, if you have some type of endpoint, an appropriate amount of logging and encryption and things like that. On the model side of things, there's a ton of things that you can discuss, which is probably an entire episode on its own, in fact. But there's algorithmic level protection that you can consider. And so if you're thinking about countermeasures against adversarial examples, there's, like, these three big areas that you could probably think about such as gradient masking or officiation, robust optimization, the last one, maybe the ability to be able to actually detect adversarial examples as they happen and being able to study the distribution of natural benign examples and being able to detect the adversarial examples and disallow their input into the classifier if you're able to do so at that time too. So a lot to think about there, but these two big constructs, securing the pipeline, of bringing that model online, getting it ready, and then securing the model itself and any potential interfaces that it may have as well. And I think the last thing that you ask is like, well where can I stay abreast or, like, learn about this stuff. What's exciting right now is, you know, everybody's excited about this space, right? And there's a ton of information out there. If you're new to machine learning in general and AI research, there's a ton of content out there. DeepMind of course has a ton of learning resources. On our website, we have a bunch of things that we've put out over the past year-plus now that's really solid content when it comes to learning. And if you're interested in learning about machine learning security, I think a really good community to join right now is the DEFCON AI Village, which you go to AI village.org and be able to join the public discord channel. There's a lot of good content and there are a lot of good discussion around AI security. You can see old talks from the AI Village on YouTube, which is pretty awesome, a lot of good content there. They also have a reading club where they discuss the latest papers that are released and published in this specific area, which is nice too if you have the time to kind of participate in that. And you could always read the latest papers that are coming out and being published on machine learning conferences. One particular one that I like is CAMLIS or the Conference on Applied Machine Learning and Security, a lot of good papers that get published there too. And last but not least, of course, there's a lot of good discussions, believe it or not, on Twitter as well when it comes to this...

>> Tim: I don't believe it.

>> Vijay: When it comes to this space, just InfoSec Twitter is kind of an interesting place sometimes. There is a lot of good content there. So definitely I would say check that out as well, too.

>> Tim: Well, Vijay, this is an awesome answer on where users can, or listeners can learn more. We don't have users. We have listeners, not a user listening to this, but thank you so much for joining us. Thank you for all of the thoughtful answers. I suspect there are many more conversations on this topic to be had in the future, and we would love to have you back on the show any time.

>> Vijay: Awesome. Yeah, no problem. It was great chatting with you all. Thanks for having me.

>> Anton: Perfect. Thank you. And now we are at time. Thank you very much for listening and of course, for subscribing. You can find this podcast at Google Podcast, Apple Podcast, Spotify. We've noticed a lot of listeners on Spotify nowadays, and wherever you get your podcasts. Also, you can find us at our website cloud.withgoogle.com/cloudsecurity/podcast. Please subscribe so that you don't miss episodes. You can follow us on Twitter as well, twitter.com/cloudtechpodcast. Your hosts are also on Twitter, @Anton_Chuvakin and @_TimPeacock. Tweet at us, email us, argue with us, and if you like or hate what you hear, we can invite you to the next episode. See you on the next Cloud Security Podcast episode.

View more episodes