Skip to content

AWS Kubernetes Kops Cluster

Overview

Kops is an official Kubernetes project for managing production-grade Kubernetes clusters. Kops is currently the best tool to deploy Kubernetes clusters to Amazon Web Services. The project describes itself as kubectl for clusters.

Core Features

  • Open-source & supports AWS and GCE
  • Deploy clusters to existing virtual private clouds (VPC) or create a new VPC from scratch
  • Supports public & private topologies
  • Provisions single or multiple master clusters
  • Configurable bastion machines for SSH access to individual cluster nodes
  • Built on a state-sync model for dry-runs and automatic idempotency
  • Direct infrastructure manipulation, or works with CloudFormation and Terraform
  • Rolling cluster updates
  • Supports heterogeneous clusters by creating multiple instance groups

leverage-aws-k8s-kops

Figure: AWS K8s Kops architecture diagram (just as reference). (Source: Carlos Rodriguez, "How to deploy a Kubernetes cluster on AWS with Terraform & kops", Nclouds.com Blog post, accessed November 18th 2020).

Kops Pre-requisites

Important consideration

K8s clusters provisioned by Kops have a number of resources that need to be available before the cluster is created. These are Kops pre-requisites and they are defined in the 1-prerequisites directory which includes all Terraform files used to create/modify these resources.

📒 The current code has been fully tested with the AWS VPC Network Module

OS pre-req packages

Ref Link: https://github.com/kubernetes/kops/blob/master/docs/install.md)

  • kops >= 1.14.0
    ╰─○ kops version                                                                                      
    Version 1.15.0 (git-9992b4055)
    
  • kubectl >= 1.14.0
    ╰─○ kubectl version --client
    + kubectl version --client
    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
    
  • terraform >= 0.12.0
    ╰─○ terraform version
    Terraform v0.12.24
    
  • NOTE1: Regarding Terraform versions please also consider https://github.com/binbashar/bb-devops-tf-aws-kops#todo
  • NOTE2: These dependencies will me mostly covered via Makefile w/ terraform dockerized cmds (https://hub.docker.com/repository/docker/binbash/terraform-awscli)

  • jq >= 1.5.0

    ╰─○ jq --version
    jq-1.5-1-a5b5cbe
    

Resulting Solutions Architecture

leverage-aws-k8s-kops

Figure: AWS K8s Kops architecture diagram (just as reference).

Why this workflow

The workflow follows the same approach that is used to manage other terraform resources in your AWS accounts. E.g. network, identities, and so on.

So we'll use existing AWS resources to create a cluster-template.yaml containing all the resource IDs that Kops needs to create a Kubernetes cluster.

Why not directly use Kops CLI to create the K8s cluster as well as the VPC and its other dependencies?

  1. While this is a valid approach, we want to manage all these building blocks independently and be able to fully customize any AWS component without having to alter our Kubernetes cluster definitions and vice-versa.

  2. This is a fully declarative coding style approach to manage your infrastructure so being able to declare the state of our cluster in YAML files fits 100% as code & GitOps based approach.

leverage-aws-k8s-kops

Figure: [Workflow diagram](https://medium.com/bench-engineering/deploying-kubernetes-clusters-with-kops-and-terraform-832b89250e8e).

Kops Cluster Management

The 2-kops directory includes helper scripts and Terraform files in order to template our Kubernetes cluster definition. The idea is to use our Terraform outputs from 1-prerequisites to construct a cluster definition.

Overview

Cluster Management via Kops is typically carried out through the kops CLI. In this case, we use a 2-kops directory that contains a Makefile, Terraform files and other helper scripts that reinforce the workflow we use to create/update/delete the cluster.

Workflow

This workflow is a little different to the typical Terraform workflows we use. The full workflow goes as follows:

Cluster: Creation & Update

  1. Modify files under 1-prerequisites
  2. Main files to update probably are locals.tf and outputs.tf
  3. Mostly before the cluster is created but could be needed afterward
  4. Modify cluster-template.yml under 2-kops folder
  5. E.g. to add or remove instance groups, upgrade k8s version, etc
  6. At 2-kops/ context run make cluster-update will follow the steps below
  7. Get Terraform outputs from 1-prerequisites
  8. Generate a Kops cluster manifest -- it uses cluster-template.yml as a template and the outputs from the point above as replacement values
  9. Update Kops state -- it uses the generated Kops cluster manifest in previous point (cluster.yml)
  10. Generate Kops Terraform file (kubernetes.tf) -- this file represents the changes that Kops needs to apply on the cloud provider.
  11. Run make plan
  12. To preview any infrastructure changes that Terraform will make.
  13. If desired we could submit a PR, allowing you and the rest of the team to understand and review what changes would be made to the Kubernetes cluster before executing make apply (terraform apply). This brings the huge benefit of treating changes to our Kubernetes clusters with a GitOps oriented approach, basically like we treat any other code & infrastructure change, and integrate it with the rest of our tools and practices like CI/CD, integration testing, replicate environments and so on.
  14. Run make apply
  15. To apply those infrastructure changes on AWS.
  16. Run make cluster-rolling-update
  17. To determine if Kops needs to trigger some changes to happen right now (dry run)
  18. These are usually changes to the EC2 instances that won't get reflected as they depend on the autoscaling
  19. Run make cluster-rolling-update-yes
  20. To actually make any changes to the cluster masters/nodes happen

Cluster: Deletion

To clean-up any resources created for your K8s cluster, you should run:

  1. At 2-kops folder context run make destroy
  2. This will execute a terraform destroy of all the kubernets.tf declared AWS resources.
  3. At 2-kops folder context run cluster-destroy
  4. Will run Kops destroy cluster -- only dry run, no changes will be applied
  5. Exec cluster-destroy-yes
    • Kops will effectively destroy all the remaining cluster resources.
  6. Finally if at 1-prerequisites exec make destroy
    • This will remove Kops state S3 bucket + any other extra resources you've provisioned for your cluster.

Typical Workflow

The workflow may look complicated at first but generally it boils down to these simplified steps: 1. Modify cluster-template.yml 2. Run make cluster-update 3. Run make apply 4. Run make cluster-rolling-update-yes

What about persistent and stateful K8s resources?

This approach will work better the more stateless your Kubernetes workloads are. Treating Kubernetes clusters as ephemeral and replaceable infrastructure requires to consider not to use persistent volumes or the drawback of difficulties when running workloads such as databases on K8s. We feel pretty confident that we can recreate our workloads by applying each of our service definitions, charts and manifests to a given Kubernetes cluster as long as we keep the persistent storage separately on AWS RDS, DynamoDB, EFS and so on. In terms of the etcd state persistency, Kops already provisions the etcd volumes (AWS EBS) independently to the master instances they get attached to. This helps to persist the etcd state after rolling update your master nodes without any user intervention. Moreover simplifying volume backups via EBS Snapshots (consider https://github.com/binbashar/terraform-aws-backup-by-tags). We also use a very valuable backup tool named Velero (formerly Heptio Ark - https://github.com/vmware-tanzu/velero) to o back up and restore our Kubernetes cluster resources and persistent volumes.

TODO

  1. IMPORTANT: Kops terraform output (kops update cluster --target terraform) is still generated for Terraform 0.11.x (https://github.com/kubernetes/kops/issues/7052) we'll take care of the migration when tf-0.12 gets fully supported.
  2. Create a binbash Leverage public Confluence Wiki entry detailing some more info about etcd, calico and k8s versions compatibilities