andon.dev

Network isolation is one of the most effective security measures we can take to protect our workfloads from external interference. From setting your firewall to block traffic in all directions to going a step further and unplugging your computer, there is a sound methodolgy behind prohibitive defaults.

More often than not though, we go up against highly pressured deadlines and security can be pushed to the right of the development cycle. This article explores some of the techniques we can use to harden our systems on Google Kubernetes Engine (GKE) with some example code so that we can shift left on our security mechanisms.

We’re going to be using Terraform code snippets because it’s the defacto. All of the stanzas in this article can be reproduced in gcloud and more than likely any well adopted provisioners you use.

Private Clusters

Private clusters are a configuration option in GKE that isolate the nodes of your cluster from the public internet. Your master nodes are unaffected in GKE because they are owned and managed by Google themselves. So, the default setting is effectively deny-all.

Depending on your clusters desired workload, this may be sufficient. But more than likely, you’re going to need to access the outside world, simply for pulling images from a registry to run.

And of course, your workload may expose themselves to the internet (or an internal system) via a Service. Typically, you will create a load balancer - either cloud provided or via an external ingress controller such as NGINX, Envoy or Traefik - and a forwarding rule from the LB to the service itself.

The good news is that with Private Clusters, none of that changes. Your LB will continue to deploy to your cluster as usual and it will be assigned an IP address by GCP which is immediately available.

Things get a bit more complicated when you want to egress your traffic as the nodes have no public IP addresses by design.

Enabling network egress

In the most simple of situations, trying to run a public container from Docker Hub will fail because hub.docker.com won’t be reachable. To resolve this, we must create a NAT Gateway that our traffic will be sent to. Because NAT gateways requires a network to attach to, we’ll create a fresh VPC instead of relying on the default one provided by GCP for the project.

Let’s create the VPC:

resource "google_compute_network" "network" {
  name                    = "my-vpc-network"
  auto_create_subnetworks = false
}

And a subnet to attach to it:

resource "google_compute_subnetwork" "subnetwork" {
  region  = "eu-west1"
  name    = "my-vpc-subnet"

  ip_cidr_range = "10.0.16.0/20"
  network       = google_compute_network.network.name

  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = "10.16.0.0/12"
  }
  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = "10.1.0.0/20"
  }

  private_ip_google_access = true

  depends_on = [
    google_compute_network.network,
  ]
}

Now, we create the cluster so that it’s attached to our new, correct VPC Network. For the sake of brevity, this code is deliberately shortened to show you the relevant stanzas for launching the cluster into the VPC:

resource "google_container_cluster" "primary" {

  # ...

  network            = "my-vpc-network"
  subnetwork         = "my-vpc-subnet"

  ip_allocation_policy {
    use_ip_aliases                = true
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  private_cluster_config {
    enable_private_endpoint = false
    enable_private_nodes    = true
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  # ...

}

By setting enable_private_nodes to true, we have described that the cluster should be given RFC 1918 addresses (ie private) and for it to communicate with the master via a private network.

Now that we have a cluster running in a network we described, we need to attach a public IP address and a router before we can create the NAT. The public IP address is the same as you will likely already be used to:

resource "google_compute_address" "address" {
  region = "eu-west1"
  count  = 1
  name   = "my-public-address"
}

The router may be unfamiliar, but it is a relatively simple object that attaches to the subnet. The BGP (Border Gateway Protocol) configuration can get complicated, but we only need the required value for asn:

resource "google_compute_router" "router"{
  region  = "eu-west1"
  name    = "my-network-router"
  network = google_compute_network.network.self_link

  bgp {
      asn = 64514
  }
}

Now that we have our network, cluster and router described, we can get on to the NAT by creating the resource, with manual IP assignments and then attaching it to our router and explicitly assigning it to the subnetwork:

resource "google_compute_router_nat" "nat" {
  region = "eu-west1"
  name   = "my-network-router-nat"
  router = "my-network-router"

  nat_ip_allocate_option = "MANUAL_ONLY"
  nat_ips                = google_compute_address.address[*].self_link

  source_subnetwork_ip_ranges_to_nat = "LIST_OF_SUBNETWORKS"
  subnetwork {
    name                    = google_compute_subnetwork.subnetwork.self_link
    source_ip_ranges_to_nat = ["ALL_IP_RANGES"]
  }

  log_config {
    enable = true
    filter = "ERRORS_ONLY"
  }
}

That’s it! Your cluster is now private but can access the internet via the NAT Gateway so you can pull your images from remote registries.

Enabling egress to GCR

If you’re running kubernetes on GCP, the chances are you will have some images stored in Google Container Registry (GCR). To enable our cluster to access these images, we need to create a managed DNS zone so that we can properly route to the registry. To do this, you need to enable the Cloud DNS API.

First we need to create the zone for GCR and configure it to be used by our VPC network:

resource "google_dns_managed_zone" "gcr_zone" {
  name        = "gcr-private-zone"
  dns_name    = "gcr.io."
  description = "GCR Access for a private cluster"
  visibility  = "private"

  private_visibility_config {
    networks {
      network_url =  google_compute_network.network.self_link
    }
  }
}

Then we simply create two records for the zone - one CNAME and a set of A records:

resource "google_dns_record_set" "star" {
  name = "*.gcr.io."
  type = "CNAME"
  ttl  = 60

  managed_zone = "gcr-private-zone"
  rrdatas      = ["gcr.io."]
}

resource "google_dns_record_set" "gcr" {
  name = "gcr.io."
  type = "A"
  ttl  = 60

  managed_zone = "gcr-private-zone"
  rrdatas      = ["199.36.153.4", "199.36.153.5", "199.36.153.6", "199.36.153.7"]
}

Note that the A records are static and supplied by Google themselves, so you can copy/paste the rrdatas value from this example.

Conclusion

This article has walked you through the steps required to build a network hardened private kubernetes cluster on Google Cloud Platform. We have the ability to acccess the internet from our cluster and reach GCR for our private images.

We could go a step further and explicitly block all non whitelisted egress traffic, which would be another 💪 for our security. There are of course other security concerns that are not related to the network configuration of our clusters, so be sure to sign up to the good old fashioned RSS feed for upcoming articles.

Subscribe to the RSS feed

Hardening workloads with private GKE Clusters

Private Clusters

Enabling network egress

Enabling egress to GCR

Conclusion

Other content you may be interested in....