Google Kubernetes Engine (GKE) is easy to get going with, but requires additional security controls. The documentation is hard to grasp as there are many features and changes that tie to specific Kubernetes versions that will require using beta feature enablement, as well as some out of date documentation that can catch people out.
There is also an outstanding bug that we have raised on Kubernetes that is hit by GKE. This is an edge case situation if you end up running pods directly and not deployments, and only happens if you enable pod security policies (which we recommend doing for security reasons).
If you are looking at production and have sensitive workloads, we advise that you implement everything inside of this article.
When you deploy a default GKE cluster with no additional options provided, you’ll get some sensible security defaults:
So, why did Google enable these things by default? Essentially it means:
The things you need to make sure you enable:
Using the virtual trusted platform module (VTPM) to sign kernel images and the OS, means that authenticity can be established. It also guarantees that nothing has been tampered with, and kernel modules have not been replaced with ones containing malware or rootkits inside.
Making sure all traffic is logged and tracked between pods and nodes will help you identify any potential risks that may arise later on. This isn’t necessarily something you need to do for development, but, something you should do for production.
Making your nodes and Kubernetes API private means it isn’t subject to network scans that are happening all the time by bots, hackers and script kiddies.
Putting it behind an internal network that you can only access through a VPN is also good, however, this is a much more involved process with GKE and isn’t as simple as a feature flag like the others. You can read about what is involved here.
Again, Google enables some default features to get you started. However, there are still gaps that you will need to fill.
What do you get?
What you will want to enable:
This sounds great, but what does it actually mean? Basically, some of these roles are split into two, one is for application development teams to own and the other is for the cluster administrator.
Application Developer
Kubernetes can be a bit of a learning curve, there are technologies that make it simpler in terms of dependencies such as helm, that will allow you to deploy application dependencies with pre-defined deployments. But there is no real substitute from understanding the main components of Kubernetes; Network Policies, Ingress, Certificate Management, Deployment, Configmaps, Secrets and Service resources.
The main security components are network policies, secrets and certificate management. Network policies will allow you to control the traffic to and from your applications. Secrets are base64 encoded, so there is no real security in terms of how they are stored; therefore making sure the cluster administrator has enabled secret encryption, (as mentioned further down), will add that additional layer.
Certificate management will make sure the traffic to your service is encrypted, but if you’re communicating between services, then you should also add TLS between your applications. Having the cluster administrator install something like cert manager, will allow an easier way to encrypt between services. There are also services like Istio, but as that product does a lot more than just certificates, it can add more complexity than necessary.
Cluster Administrator
You want to make sure that development teams can’t deploy insecure applications or make attempts to escalate their privilege, mount in devices they shouldn’t or make unnecessary kernel system calls. Pod Security Policies offer a way to restrict how users, teams or service accounts can deploy applications into the cluster, enforcing a good security posture.
RBACs and Pod Security Policies go hand in hand. Once you define a good Pod Security Policy, you then have to create a role that references it and then bind a user, group and/or service account to it, either cluster wide or at a namespace level.
Note: GKE uses a webhook for RBAC that will bypass Kubernetes first. This means that if you are an administrator inside of Google Cloud Identity Access Management (IAM), it will always make you a cluster admin, so you could recover from accidental lock-outs. This is abstracted away inside the control plane and is managed by GKE itself.
We recommend the below to be a good PSP. This will make sure that users, service accounts or groups can only deploy containers that meet the criteria below.
If you wanted to create a role to use the defined PSP above, then it would look like something below, this is a cluster role as opposed to a standard role. To then make this enforced on say all authenticated users, you would then create a role binding to apply to the “system:authenticated” group.
Remember that as this is cluster wide, any applications that may need more privileged permissions will stop working, some of these will be things Google adds into kubernetes; such as kube-proxy, that runs in the kube-system namespace.
You can read more information on RBAC and PSP’s on kubernetes.io.
We’ll break this down into two.
1. Encrypting Your Secrets in Kubernetes
The recommendation for encrypting secrets using Google Clouds KMS service, is to have segregated roles and responsibilities, and to define a new Google project outside of the Google project that will host Kubernetes and applications. This is to make sure the encryption keys and, more importantly, the key that is signing the other keys (envelope encryption) isn’t residing in the same project that could potentially get compromised.
For encrypting secrets you need to:
The documentation on how to do this can be found here. But the main things to remember are:
Once this is setup you can pass the path to the key, to the “gcloud container clusters create” command:
Note: If any of the above is incorrect, you will get a 500 internal server error when you go to create a cluster. This could be the path is incorrect, the location is wrong or the permissions are not right.
2. Consuming Cloud Service Inside of Kubernetes
There are four different ways to allow application containers to consume cloud services. All of these have limitations in some way, i.e., being less user friendly and unautomated for developers (making developers wait for the relevant access to be provisioned), or have less visibility and more complexity in tying together auditability for Kubernetes and cloud administrators.
Note: As of today, there is no access transparency on when Google accesses your Kubernetes cluster (Google Access Transparency). This could be problematic for a lot of organizations who want assurances around the providers data access.
When the cluster is provisioned, all audit logs and monitoring data is being pushed to Stackdriver. As the control plane is managed by Google, you don’t get to override or modify the audit format, log or extend it to your own webhook.
It does mean that you can search your audit logs for things that are happening inside of Kubernetes in one place, for example; to query all events against a cluster inside of a specific Google project against your user ID, you can do the below:
From this point on, you could take it further and add additional security rules such as setting custom metrics to alert when cluster admin changes are made or specific modifications are happening on roles (cluster-admin roles).
Security is something everyone wants but, as you can see, it can be quite an involved process to achieve. It also requires domain knowledge to get enough context to assess the risk and what it might mean to your applications and business.
Security without automation can slow down productivity. Not enough security can put your business at risk. Enabling security features that require Beta feature enablement may also not be suitable for the business and only General Availability features are acceptable, which compromises on security.
As a general rule, hardening your clusters and enforcing secure ways of working with Kubernetes, containers, cloud services and your applications will get you the best outcome in the end. There may be frustrating learning curves, but as the industry matures, these will slowly be remediated.
To learn more about containerized infrastructure and cloud native technologies, consider coming to KubeCon + CloudNativeCon EU, in Amsterdam. The CNCF has made the decision to postpone the event (originally set for March 30 to April 2, 2020) to instead be held in July or August 2020.
Expect attacks on the open source software supply chain to accelerate, with attackers automating attacks in common open source software…
The emergence of low/no-code platforms is challenging traditional notions of coding expertise. Gone are the days when coding was an…
Datadog today published a State of DevSecOps report that finds 90% of Java services running in a production environment are…
Linux dodged a bullet. If the XZ exploit had gone undiscovered for only a few more weeks, millions of Linux…
We're going to send email messages that say, "Hope this finds you in a well" and see if anybody notices.
I am happy and proud to announce with Daniel Newman, CEO of Futurum Group, an agreement under which Futurum has…