Autoscaling

Autoscaling is a process of scaling up/down where pods are increased/decreased in order to distribute load between them if the load on the pods gets greater/less then the defined threshold. Application Autoscaling can be done using HorizontalPodAutoscaler which is defined by Kubernetes API autoscaling/v1 or autoscaling/v2beta2

Prerequisites

In order to use Horizontal Pod Autoscalers, your cluster administrator must have properly configured cluster metrics. You can use the oc describe PodMetrics <any-pod-name> command to determine if metrics are configured. If metrics are configured, the output appears similar to the following, with Cpu and Memory displayed under Usage.

...
Containers:
  Name:  pod-name-xyz
  Usage:
    Cpu:     17877836n
    Memory:  503572Ki
...

HorizontalPodAutoscaler

Stakater Application Chartopen in new window uses autoscaling/v2beta2 API which provides additional metrics other then CPU only to be used as a metrics for autoscaling. These metrics can be CPU, Memory or custom metrics exposed by the application See hereopen in new window

MetricsDescription
CPU UtilizationNumber of CPU cores used. Can be used to calculate a percentage/Integer Value of the pod's requested CPU.
Memory UtilizationAmount of memory used. Can be used to calculate a percentage/Integer Value of the pod's requested memory.

Defining Autoscaling on CPU

To define Autoscaling on the basis of CPU define the autoscaling section in your HelmRelease object.

In the following example we will use averageUtilization ( in Percentage) calculated on all the pods.

autoscaling:
  enabled: true
  minReplicas: 3                    # Minimum running pods 
  maxReplicas: 10                   # Maximum running pods
  additionalLabels: {}
  annotations: {}
  metrics:
  - type: Resource
    resource:
      name: cpu                     # Define autoscaling on the basis of CPU
      target:
        type: Utilization           # Calculate on the basis of Utilization Percentage (in percentage of the requested CPU)
        averageUtilization: 80      # Scale when the average utilization of all pods go above 80%

Defining Autoscaling on Memory

To define Autoscaling on the basis of CPU define the autoscaling section in your HelmRelease object.

In the following example we will use averageValue (in integer value) calculated on all the pods.

autoscaling:
  enabled: true
  minReplicas: 3                    # Minimum running pods 
  maxReplicas: 10                   # Maximum running pods
  additionalLabels: {}
  annotations: {}
  metrics:
  - type: Resource
    resource:
      name: memory                  # Define autoscaling on the basis of memory
      target:
        type: AverageValue          # Calculate on the basis of Utilization Value ( in integer value )
        averageValue: 500Mi         # Scale when the average utilization of all pods go above 500Mi

Autoscaling with GitOps

If you are using GitOps to manage your applications across clusters, you need to ignore the difference for replica count to make autoscaling work.

Problem: When your HPA will try to increase the number of pods, at the same time your GitOps tool will also try to maintain the original state of your application and it will terminate the newly created pods after autoscaling.

Solution: Update your GitOps tool to ignore the difference for replica count, so that whenever HPA scales up the number of pods and increases the replica count, the GitOps tool doesn't try to sync the replica count and doesn't terminate the new pods.

Example (ArgoCD): Argo CD allows ignoring differencesopen in new window at a specific JSON path, using JSON patches. The following sample application is configured to ignore differences in spec.replicas for all deployments

spec:
  ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
    - /spec/replicas

How to test HPA

To test the HorizontalPodAutoscaler with your application, you need to install the HPA for your application and then gradually increase the load (memory or CPU depending on HPA configuration). You can use tools like postman, JMeter, readyAPI or a manual scriptopen in new window to increase the load on your application.

You can monitor the Horizontal Pod Auto Scaler from your OpenShift/Kubernetes dashboard or with command

kubectl describe hpa <hpa-name>

The CPU/memory usage and the events should show the application pods getting scaled up and down when the load increases or decreases.

HPA Metrics:HPA Metrics High LoadHPA Metrics Low Load

HPA Events:HPA Events

Useful Links