Do you have something cool to share? Some questions? Let us know:
Timothy: Hi there. Welcome to the Cloud Security Podcast by Google. Thanks for joining us today. Your hosts here are, as ever, myself Timothy Peacock, the Product Manager for Threat Detection here at Google Cloud, and Anton Chuvakin, a reformed analyst and much beloved member of the cloud security team here at Google. You can find and subscribe to this podcast wherever podcasts are available, as well as at our website, cloud.withgoogle.com/cloudsecurity/podcast. If you like our content and want to deliver to you piping hot every Monday afternoon, please do hit that subscribe button on your app of choice. You can follow the show and argue with your hosts on Twitter as well, twitter.com/cloud psych podcast. Anton, this is I think a super fun episode for a lot of reasons. One it's one of the more in the weeds technical episodes we've done in quite some time.
Anton: That's correct and also there would be a couple moments when I interrupt the guest and say, "Wait a second, this sounds dangerous."
Timothy: There is some honest to God danger in this episode, which is kind of fun. And I think this episode also features the easiest to understand explanation of how identity and permissioning work in Google Cloud.
Anton: Correct. And it has also a very fun argument about out how to teach cloud to on-premise people for a particularly confusing area, identity in the cloud. So, to me, this is also very fun.
Timothy: It also—listeners, you won't see this, but both Anton and I made very confused faces during this episode. So, you'll hear us talk about that, but you won't get to see it unfortunately. We'll have to get screenshots of that and put it on the Twitter, won't we? So perhaps with that, let's welcome today's guest. I'm delighted to introduce today's guest Dylan Ayrey, Co-founder of the Truffle Security Co. Dylan. I've heard rumours that security in the cloud has a lot to do with identity. Could you tell us why identity is important for cloud security? What's going on there?
Dylan: No, it's a great question. Identity in a nutshell is what it allows access to cloud services. So basically, you have identities in the cloud and those identities get permissions or roles that facilitate who can do what, who can access what data and who can run what compute resources. So, all of that is kind of built into this idea of identity. You have both machine ID and user identities. And then on top of that, you have permissions that get layered on top. I think what's particularly interesting about Google is this idea that rather than identity being user centric, like people traditionally think of it as like what does a user have access to? Those types of questions. Actually, aren't answerable in Google cloud. Instead, identity is resource centric. So, who has access to this resource? And that took a little while for me to get my head around. But basically, the cloud is built in such a way where an individual user or a machine identity can be given access to lots of different resources, but the owner of that machine identity can't necessarily see those grants, but the resource owner can always see all of the different identities that can access their resource. So as a database owner, you can say, who can and access my database? But as a machine identity owner, you can't answer the question, what does my machine identity have access to? Instead, all the different resource owners out there have to be able to answer the question in summation for you. And so, I think when it comes to Google Cloud in particular, that was the toughest barrier for me to mentally get my head around. But once that kind of clicked, everything else sort of fell into place from there.
Anton: Wait a second. So, our guests aren't seeing the video of course, because it's a podcast, but I am looking at my own picture and I'm looking at Tim and we both look kind of confused. I think I know what you said, Dylan, but imagine an on-premise security person who is maybe working in a SOC, or maybe working just security systems, how do you explain it what you just said to somebody who isn't really well versed in the cloud? Is there any way to explain it to an on-premise person?
Dylan: Yeah, that's a really good question. So, in a traditional environment, you might have something like active directory that determines what identity is and what groups and permissions an individual identity has access to. For example, within active directory, you might put individuals into a group that allows them to have access to a particular computer network and then within that computer network, you might have things like databases and things like that. So, you may only want your database admins to fall into this particular ad group. And then you would easily be able to log into active directory and say, oh, well, these database users have access to this database group, which allows them to access this database. In Google Cloud, it's a little bit different. Instead, you have owners of databases, so everybody can kind of own their own database. And then you have each one of those owners have the ability to add users and identities and groups to those databases. So rather than being able to say, "Oh, let me look up Tim and see what databases Tim can access." Google Cloud actually doesn't have that ability like you might have in sort of the active directory world. Instead, you can say, "Well, let me look up this database, that's the credit card number database and see who can access this database?" And so, we can get clear answers to who can access the database, but we can't get clear access to what are all the databases that Tim has access to. So, it's very centric around the resource and not centric around the identity.
Timothy: So, it's easy to figure out who can touch a piece of data, but it's hard to figure out given identity what it could touch.
Dylan: Yeah. So, if you have a piece of data, you can figure out who can touch it, but you can't say, well, what data can Tim touch? That's a much harder question to answer.
Timothy: Surely, it's answerable though, right? Somehow?
Dylan: Actually, by design, it's not answerable. And the reason why is all of the different data owners can grant you access to their data even outside of your organization. So, you can think of a large Google cloud customer like Snap, for example, could arbitrarily say, I want to give Tim access to our production database and the permission to be able to view that role binding is one that you wouldn't have by default. So, you wouldn't even be able to see that the binding happened, but you would still be able to be given that permission. So, when I say, what data does Tim have access to? I wouldn't necessarily know that Tim has this random permission over to be able to view the Snap database. And even if I have the ability to create roles, and one of these roles is one that I just created, I still don't have control over all of the permissions that get assigned to identity -- I mean, if I have the ability to create a service account or an identity, I don't have the ability to see all the permissions that identity has been granted because those can happen to resources that are outside of my control, outside of my organization, and I wouldn't have the permissions to even list or view those. And so that's kind of, when I say it's very resource centric. If you have a resource, you can see who can access that resource, but it's not identity centric. So, you can't say, what does this identity have access to? But you can say, who has access to this resource?
Timothy: Got it. Well, my friends at Snapchat, please don't grant me permissions on any production databases. I don't want that. That's more trouble than it's worth.
Anton: And I think that it makes sense, and I think that to a little bit of a meta conclusion on this is that it's not like the system has more complexity. It's the system is different to what you're used to. Yes. If you are used to servers and boxes and getting a shovel and pushing the VM's from place to place, this system is just different. It's not better or worse. It's to me a little bit different. I think that's a summary. And let me briefly put a cynical hat. Tim, yes, I do have one of those and the cynic in me says...
Timothy: Only one?
Anton: ...Ah, okay. Let's avoid, I'll avoid the question. So, somebody who's most sceptical about cloud security once told us that in the cloud, we are one identity mistake away from a breach. And it's stuck in my head as kind of a line, because of course, if you make a mistake with identity, but nobody makes a mistake with firewalls, your database isn't exposed to the outside on-premise. But we are in the cloud. And if I make an identity mistake, maybe because I'm an on-premise dude who doesn't understand cloud what'll happen, is it as bad as he said or not?
Dylan: Well, that's a really good question. I think one of the big differences is that all of the APIs that you talk to are on the public internet. So, if you do have, for example, a key that may be accidentally gets posted in a place where you don't want it to be. And that key has access to certain resources. It is not the same as a password for an on-premise service leaking out because that password can't be used unless you're on the private network, but the key can still be used to talk to a public API. So, I think what you mentioned before it hits the nail right on the head is it's not better or worse, but it's just, it's different. And it's different in ways that have both pros and cons, and it's really important that we understand all of those nuances. And so, while it's true that you can definitely misconfigure firewalls. It's also true that having all of the APIs on the public internet comes with its own set of risks, and we just have to learn the new language and learn to adopt to this new set of risks, which is different than the traditional on-prem access controls that were the norm 20 years ago.
Anton: That actually makes sense. And so, we -- how about next thing we need to touch is of course the whole service accounts angle. So, for somebody who isn't deep with GCP, what's the 30 second story and a service account. How is it regular from a normal IM account? I understand there's no human behind it. That part I do get, but what about the rest? What's the service account?
Dylan: So, let me define a couple roles in GCP. So, I mentioned the idea of a re resource. Resources can be collected together into groups of resources, and we call those projects in GCP. Now to get access to resources or to projects, we grant two types of identities access to either a resource or a collection of resources. The first type of identity is a user. So that would be something like Tim. So, Tim would be our user. Tim could also fall into a collection of users, which are called groups. And so, on the equation we have Tim or a group of Tim's, and then we have a...
Timothy: A horrifying thought.
Dylan: ...And then we have a role and on the other side of the equation, we have a resource or collection of resources, which would be a project. So that works all well and fine if we need to find out how Tim interacts with the cloud, but a lot of times it's not a user interacting with the cloud, it's a machine or a machine identity. And that's where service accounts kind of enter the picture. If we want a laptop to be able to talk to the cloud, it could either use Tim's permissions or we could create a machine identity that has its own set of permissions separate from Tim's permissions. And that allows us to create an identity that's scoped down to the specific use case that we're working on and doesn't necessarily have access to all of Tim's Google Drive and all, all the extra things that Tim has access to. And so, we call that type of identity, a service account, and those service accounts don't have to just run on a laptop. They could also be directly assigned to a compute resource in the cloud, like a cloud function or a VM or something like that. And the permissions for those cloud functions or VMs are contained within this thing called a service account, which are different than the overall set of permissions that a user or a group of users might have.
Timothy: Got it. I found service account very confusing early on because I thought they could only belong to services, but of course they could belong to laptops, they could belong to a server, they could belong to a machine somewhere. So, it's perhaps a confusing name for people that are new to cloud and listeners. One of the other things I want to add before I ask the next question is I want to remind our listeners that Dylan is not a Googler. Dylan, actually, I don't think you've ever worked at Google Cloud. You're just a user who became the founder of a security company focused on this problem. And given all of that, this is still maybe the most cogent explanation of how our identity system works that I've heard in my four years of working here. So, thank you for that.
Dylan: I might correct something that you just said there. I did find a security company, but actually it has nothing to do with this problem, it's unrelated. But I did do a lot of research, particularly into Google Cloud as an end of pent security researcher with my co-presenter, Allison Donovan. And we did quite a bit of research with the Google team, came on site a number of times, presenting to a number of Google PMs and ended up doing the most viewed black hot talk in 2020 specifically on this subject on GCP IM.
Timothy: Wow. So, with all of that, I expect to have another incredibly cogent answer for this and hopefully this becomes the most viewed podcast episode of 2022. What's a service account impersonation?
Dylan: I think this kind of leads itself into a lot of the content that we talked about at Black Hat basically it's this idea that one service account or one user can actually through cloud APIs give itself the permissions of another service account. So, if you, for example, have one of these set of three possible permissions, right? So, remember when on the one side, there's an identity; and then in the middle there's that connector, which is like a role or permissions; and then on this side there's resources. those resources. And this is where things can get a little bit tricky to understand, but a resource could be something like a database, but a resource could also be a service account. So, a service account can actually live on both sides of this equation. It could both be something that gets permissions and something that receives permissions. So, service account impersonation is what happens when we take an identity and we grant it a certain permission to a service account or collection of service accounts that allows that identity to take on all of the permissions of the service account that we've attached it to. And there are three specific permissions that allow us to do this. There used to be more, and we'll dig into that in a second, but the three that exist today are the act as permission. And what act as allows you to do is it allows you to take a service account and attach it to a resource that you control. So, if there's a service account out there that has, let's say an owner permission, and I don't have owner permission, but I do have an act as role on that service account. I can take that service account; I can attach it to a VM that I control. And then that allows me to take on the owner permission that that service account had. So, we can use this to elevate our permissions. We can start with something that doesn't have a lot of permissions and all of a sudden become owner if we have that. The second is called the token creator role and what the token creator role allows us to do is that allows us to export a key, kind of like a password, an API key for a given service account, and then use that on behalf of the service account that we exported it from. So, if there's that owner service account, we can take a password or an API key that can do everything that owner account can do. And then we can use that to do everything that the owner account can do. So, we can elevate our permissions the same way with that key export.
Anton: Okay, so let me ask you this. It sounds kind of dangerous.
Timothy: There's a third piece of danger we should hear about, and then we'll talk about why it's dangerous.
Anton: Okay, fine. I'll shut up.
Timothy: What's the third piece of danger?
Dylan: The third one is called workload identity user. And this actually came about when we introduced workload identity to things like GKE. Basically, the idea is that it used to be back in yield days, when you use GKE, the nodes had service accounts attached to them and all of the workloads that you ran would use the permissions of the node, and that had a whole host of issues. So, the people at Google Cloud created a better system where each workload running in GKE could have its own permissions. But in doing so, we also have this new permission that allow us to assume the identity of those different workloads. And we need that because our node needs to be able to spin up these workloads, but it does create that other path of danger. And before we jump into your next question, there's the fourth one that I want to touch on, this gun's near and dear to my heart. And this is the fact there used to be a whole lot of permissions that allowed you to access an identity called the default identity, the default service account. And actually, there's a couple of default service accounts. They all exist when you create a new project, they get spun up in your account. And by default, they have a lot of these dangerous permissions built into them. So, it used to be if I wanted -- Sorry, Tim, to keep picking on you -- but if I wanted Tim, for example, to be able to run a data proc job, I would grant Tim a data proc permission and at its face, it would look like it couldn't do anything other than data proc things, but because that then allowed Tim to create a data pro compute resource. By default, that default identity got attached to it and that default identity has both token creator and act as applied to your entire project. And so, this allowed Tim, when I just gave him, data proc to all of a sudden jump, to be able to spin up a new resource and have that default identity attached to it. And then all of a sudden get access to all of the service accounts in the project. And when we pointed this out, leading into our Dev Con talk, Google actually changed the rules. They sent out an email and they said, "We recognize that this shouldn't be a thing. We shouldn't be able to go from data proc to editor, to all the rest of the service accounts in the project." And so, they sent out an email saying, "This will no longer be the case. You will no longer get the default identity attached to your data, proc data flow and composer workloads. Instead, you'll need to attach a different identity to it." And those rules got changed. So now we don't have four issues we need to worry about in terms of in person. We only have those three, which are act as, token creator and workload identity user.
Timothy: So, there's a couple of cool things in there to pick on and drill into. First, what a great example of vulnerability disclosure like working out positively. Thank you on behalf of all our users and on behalf of Google for that. But two, I want to pick a little bit on this distinction between act as, and token creator. Because with act as the logs show that you're acting as the service account, right?
Dylan: I think you might be flipped there. With token creator, if there's a -- and the rules may have changed by the way, because the team is constantly updating and improving and adding extra security to the platform. It's one of the nice things about working in the cloud is auto updates and is always constantly getting better. But when we were looking into it, token creator in particular, when you created short term or long-term tokens, those got logged pretty well. Like you could usually, call it 99% of cases see when a token got created and you'd be able to use that say, "Oh, well, Tim just made a token for this thing." There's a potential exploit opportunity there. And that Tim is now maybe perhaps operating on behalf of the service account. But what was less transparent at the time was if Tim created a VM and attached a service account to that VM, at the time that didn't get logged super well. And because of that, that created an opportunity where an attacker could come in, take some identity, create a new VM, attach a service account to that new VM that just got created and then start doing things on behalf of that service account. And they could just keep playing this game. So, we open sourced a tool that basically took that to its extreme, where you would start with a service account that had act as on a project with a bunch of service accounts. And you would basically create one cloud function for every single service account in that project. And you'd have a whole host of cloud functions. And then for all of those service accounts, you would say, "Do any of these have permissions into other projects?" And if so, spin up cloud functions in those projects with all of the service accounts and all of those projects and it...
Timothy: Oh, no.
Dylan: ...recursively spidered out like that. And the only thing that would show up in the logs is just, oh, well, Tim's just making a bunch of cloud functions. It wouldn't say that he's attaching all these extra identities and adding more and more keys to his empire. And so, the last time we looked into it act as was actually not quite as logged as we would've wanted it to be, but token creator had pretty good logging and attribution around it.
Timothy: Got it. Well, that is alarming and I don't like the idea of these spidering out service account cloud functions. That just makes me uncomfortable.
Anton: I'm sure it has some legitimate use just like, you know, memory injections in the life memory process and a desktop has legitimate use, right, right, right, right? Okay, so on a more serious note, how do we detect this? Like you mentioned, I get the whole thing about logs that are sort of unrelated that have been produced or vaguely related. So how can I see if my service accounts have been impersonated? What's the detection strategy? What are the rules? What are the approaches I can take? I know maybe it's not such a great question with a clear answer, but like where would I look Dylan to look for detections?
Dylan: So, I think one of the things that we just touched on, the fact that token creation is pretty well logged allows us to anchor on that in particular. And if you see all of a sudden, an identity is creating a whole bunch of tokens for a whole bunch of service accounts, that's a pretty suspicious thing that we wouldn't normally necessarily see. So that's one thing to anchor around. Another thing to anchor around is that the thing that I mentioned, we open source the spidering. We intentionally wrote detection rules for that tool itself. While it's not generic to all the behaviour, at least if someone just out of the box runs that spidery cloud functioning thing, you can at least detect to see whether or not they're using that specific tool. And that we included that in the repository and you can see what stack driver filter to add, to detect on that. I think the number one thing that we try to impress on the team is like when it comes to logging and figuring out who's doing what, when, and when these service account impersonations are happening, like that's so important. And we have seen some progress there, but I also think we're not at the end state there yet. I think there's more work to be done to improve our logs so that we can better get a handle on some of the act as permissions and things like that. And I think we'll expect to see that improve over the next year or two.
Timothy: That is one of the beauties of the cloud and the control plan of the cloud in particular is you can't not patch it, it just happens for you, which is great. So, on the same sort of line of questioning as Anton's after is how do we secure our organization against this? Cause despite being the threat detection guy, I actually believe that an ounce of preventions was at the pound of cure.
Dylan: Yeah, absolutely. And so, I think that the most important thing for an individual organization maintainer, that they could possibly do is enable an org policy that was again, created in collaboration with us in going through this exercise. This is a new org policy that came out in the last year or two called the Automatic IM grants for default service accounts. And so, what this does is if you remember when I mention that there are some default service accounts that get created, those default service accounts still get attached to VMs and cloud functions and all kinds of things by default, often, without you even realizing it. And by default, they have act as and token creator permissions on all of these service accounts in your project. And so that's a super, super dangerous thing, especially for a larger organization that might have a lot of service accounts. And so that the easiest thing to do is just to remove that default grant so that when you create a new VM or a new cloud function, you start at square zero in terms of IM and you don't start with act as, and token creator attached to those resources. So that's probably the number one thing is disable that default grant. And you can do that through an organization policy that was created in collaboration with our efforts.
Timothy: Tell listeners what an organization policy is though, because not everybody's familiar with that piece of cloud magic.
Dylan: Yeah, no, that's a good call out. So, an organization policy is basically a set of rules that everyone in your organization has to abide by. Whereas before when you create a project, the default is, you could do a lot of things like you can add people outside of your organization and you can attach firewall rules and stuff like that. If you go in and start pushing buttons and knobs inside the org policy, you can disable a lot of those things or make those things impossible for new project owners to be able to do. The thing about default service grant, it's not actually an active action anybody ever took in the first place. It was just something that cloud kind of like put on you. And so, by disabling it, nobody has to have that put on them. Basically, no one has to have these default service accounts attached to their services. It can create a little bit of friction because before all of your VMs and cloud functions, they could do a lot of stuff. They could talk to buckets and they could perform act as, and they could do all these things. So, all of a sudden taking that away can create a little bit of friction, but when it comes to securing against these particular impersonations problem, it's the easiest and highest impact thing that you can do for your organization, kind of worth the friction that it adds. And then the other thing that I'll mention is that when it comes to these impersonation roles, they have a particular purpose. Like if you want to be able to attach a cloud function to a VM, you need act ads. So, it's not like we could just get rid of the permission across the board, but what we can do if oftentimes these permissions are done at the project level, in fact, that's where the default grant is done as well. It's done at the project level and the UI, when you push the IM button, that's talking about project level IAM grants. It's a little bit trickier, but when you dig into the individual resources, you'll find there's a separate set of permissions there. And that's actually how we do resource level grants. And so, what I mean by that is rather than saying, all of our -- this Tim identity has act as on all of the service accounts in the project, that's what I get from the IM page. Actually, I just want Tim to be able to grant this one service account to his VMs. And to get to that, we have to go to the service accounts page, dig into the service account, click the roles and permissions of that service account and then there, we can set our act as permission. And so that's the difference between doing a project level grant, which is to a whole collection of resources versus an individual resource grant, which is just to the one thing that we're after. And so, when it comes to impersonation, if you need impersonation and there are plenty of legitimate use cases for it, highly, highly recommend defaulting those two as small as possible, just the resource rather than a whole project, because when you do it to a whole project, that's when you start to get in trouble, you start to pull in extra service accounts that we don't know what permissions they have and things like that.
Timothy: Got it, that makes a ton of sense.
Anton: Okay. That actually makes sense to me. I was about to say that this, I can see how you can combine it in a more secure approach to permissions, but I also can see how people would trip over it and fail face into the mud. I guess this kind of depends on the knowledge. By the way, in some other episodes, we did tell people, learn the cloud out before you migrate. And people laughed at us for this being trivial advice. It is not trivial advice. You do need to learn cloud before you go there. So now we are almost at time I wanted to ask our two traditional closing questions. One would be easy. Where can the audience learn more about this? And your reading materials of course, link to Black Hat present would be helpful. And the second is one practical tip on how to deal with a threat, with an issue.
Dylan: So, when it comes to this particular problem, which is the privilege escalation and lateral movement within Google Cloud, the number of resources on it is actually pretty light, but there are some really luminary people that I can point to. The first is somebody named Spencer who used to work at Rhino Labs and there were a bunch of great blog posts on Rhino Labs around cloud security stuff. I think Spencer is at CrowdStrike these days. The next one is actually someone who works at GitLab. His name is Chris and he's done a whole bunch of stuff on Google Cloud privilege escalation. And there's some great writeups on the GitLab blog post on that. And the last person I'll call out is cat Traxler, who used to work at Best Buy and also has done a lot of speaking in this space and she's done a lot of blog posts. I don't think she's at Best Buy anymore. I forget which security company she's at, but that's probably the third resource. So, there's the Black Hat talk we gave in 2020, which has a whole bunch of fancy video editing because I was bored during the pandemic. So, check that out. There's the Spencer's content on Rhino Labs. Chris's content on GitLab. And then Kat Traxler's content that's kind of a little bit all over the place, but I think she was at Best Buy when she did a lot of it.
Anton: We'll look them up and put the links. Thank you very much for this. And any practical tips like your favourite tip when people deal with this?
Dylan: So, I think if there's one thing that I can impress on people it's that default grant org policy, if you can disable the automatic IM grants for default service accounts, I know it's a mouthful, but that's the actual name of the org policy. If there's one button to mash, it's that one. If you don't know anything about cloud security, if you're switching over from on-prem and you're moving into Google, I've matched that button in your org policy and it'll improve your security in myriads of ways that I can't even get into here.
Anton: Perfect. Thank you very much for this. I very much appreciate you being in the podcast. I think it was fun. Looking forward to future discussions perhaps.
Dylan: Yeah, absolutely. Thanks so much.
Anton: And now we are time. Thank you very much for listening and of course, for subscribing. You can find this podcast at Google Podcast, Apple Podcast, Spotify, or whatever else you get your podcast. Also, you can find us at our website cloud.withgoogle.com/cloudsecurity/podcast. Please subscribe so that you don't miss episodes. You can follow us on Twitter, twitter.com/cloud/podcast. Your host are also on Twitter @Anton_Chuvakin and _TimPeacock. Tweet with us, email us, argue with us. And if you like or hate what we hear, we can invite you to the next episode. See you on the next Cloud Security Podcast episode.