Scaling and Load Balancing in Kubernetes

Disposability

Kubernetes is designed to facilitate many cloud native principles; for example, disposability. Disposability means that workloads should be ephemeral where possible – easily replaced rather than migrated, fixed or upgraded directly.

Disposability is often described as managing servers like “cattle not pets.” That means don’t groom long living servers, dialing them in manually, SSH’ing in to upgrade application versions and manage configuration. Instead run them like cattle, replace them when they are unhealthy or a new version is needed, and automate as much as possible.

To accomplish this separating state is essential and, in fact, initially Kubernetes was only for stateless workloads. More recently StatefulSets and PersistentVolumes have been added (we’ll cover this in another blog post in two weeks), but where possible, workloads are run in stateless Deployments. This allows them to be horizontally scaled, which aids in disposability, high availability, and elasticity.

In this post we’ll focus on scaling and load balancing, but Kubernetes also has many other ways of facilitating the advantages of disposability. For example, health checks and automatic replacement when Pods (where the Docker containers run) become unhealthy and rolling updates where Pods of a current version are replaced with a newer version one by one.

Services and Load Balancing

In Kubernetes, workloads run in containers, containers run in Pods, Pods are managed by Deployments (with the help of other Kubernetes Objects), and Deployments are exposed via Services. The Pods have ephemeral, internal IPs, whereas Services have Endpoints which may have static external IPs. These Endpoints, along with Services, expose the Deployments either internally with a Service Type of ClusterIP, or externally with type LoadBalancer. Here’s a diagram to illustrate:

Note: There are other Service types. For more on Services, check out our first blog post of this series, Deploying Workloads on Kubernetes and the official documentation.

On PKS, the load balancer is provided by NSX-T, on AWS it’s ELB (Elastic Load Balancer), and on Google it’s Cloud Load Balancers. We have an upcoming post on the PKS and NSX side of networking in a few weeks called PKS and NSX-T Design. In this post, we’ll be focusing on the Kubernetes side.

Another cloud native principle embedded in Kubernetes design is microservice architecture. This is a way of developing apps where you split them into many small services with narrowly defined responsibility. That’s where Services got their name, and Kubernetes expects you’ll run your application as many distinct services, each horizontally scaled across many replicas. Though even if you don’t use microservice architecture, you could serve a monolith in one horizontally scaled Service or put the tiers of a 3-tier app each in a separate Service.

In any case, the Kubernetes Service Object exposes a workload (containers in Pods), which is typically managed by a Deployment. The Deployment is a newer Kubernetes Object which adds useful features to an older Object, the ReplicaSet. These days you typically define a Deployment, and Kubernetes handles creating the ReplicaSet that the Deployment manages. Therefore, to scale your workload, you change the number of replicas (Pods) in the Deployment.

Scaling Deployments

When speaking of scaling in Kubernetes, it’s important to keep separate scaling the cluster (number of Nodes) from scaling Deployments (number of Pods). Scaling the cluster increases the resources available to the entire Kubernetes installation, while scaling a Deployment adds additional Pods which gives that Deployment more of the cluster’s available resources.

Scaling a Deployment can be as easy as updating the desired number of replicas in that Deployment. You can do that with a CLI command, for example:

kubectl scale deployment.v1.apps/nginx-deployment –replicas=10

deployment.apps/nginx-deployment scaled

Or, more commonly, you would update that Deployments YAML manifest with the new number of replicas, then simply use the CLI to reapply that file. For example:

kubectl apply -f my-deployment.yaml

You can also go one step further and use a Horizontal Pod Autoscaler. This allows you to apply a variety of rules to automatically scale a Deployment up and down to a smaller or greater number of Pods based on load. For example, you can define the metric (CPU, RAM, requests) to scale by, and add cooldowns and delays, which keep the number of replicas from fluctuating rapidly.

Scaling Clusters

While Kubernetes can manage the scaling of Deployments within your cluster, it cannot scale itself by making its own cluster bigger. That requires something outside of Kubernetes, which is managing the Kubernetes install and the VMs that it’s installed on.

Kubernetes is a combination of several programs, some of which run on the Master, and some of which on the Nodes. A Master with N number of Nodes connected is a Kubernetes cluster, and so to scale that cluster you’d either add additional Nodes or increase the capacity of the Nodes that are already running.

In the case of PKS, vSphere (or GCP) is the provider of the VMs that run the Master and the Nodes. Through PKS you can manually scale the size of the cluster, or setup autoscaling. Similarly, GKE can autoscale the Nodes running on GCE (Google Compute Engine), and EKS (Amazon’s Elastic Container Service for Kubernetes) can scale your cluster across Nodes that run on EC2.

Fractal Layers

It’s interesting how the same patterns adopted in virtualization are being applied to containerization, and even in some ways to bare metal. For example, Dell physical servers can be autoprovisioned into a vSphere environment, PKS can use vSphere and BOSH to autoscale the number of Kubernetes Nodes, and Kubernetes in turn autoscales the number of Pods running your containerized workload.

Disposability is seen at all levels, especially in the virtualized and containerized layers. If a Node is lost, Kubernetes simply reschedules the lost Pods to other Nodes; meanwhile, PKS begins “autohealing,” which really means replacing the Node with a new one. Similarly, if a Pod becomes unresponsive, Kubernetes does not try to fix it; rather, it simply kills and replaces it. Pods are not upgraded when new versions of the application/service are released. They are instead replaced with new Pods, built on new container images.

Load balancing with horizontal autoscaling (or even fast and easy manual scaling) are a big part of the reason cloud native principles are being adopted and tools like PKS are being leveraged. Companies like Google (birthplace of Kubernetes) have shown the world the reliability and agility that can be achieved through these tools and methodologies. Those attributes lead to competitive advantages. Now businesses are clamoring for digital transformation and are interested in a managed Kubernetes solution like PKS to power it.