One of the main features of the Banzai Cloud Pipeline platform is that it allows enterprises to run cost effective workloads by mixing
preemptible instances with regular ones, and without sacrificing overall reliability.
First, let’s dig into some of the components that make spot instances so reliable, then we’ll provide an example of a Pipeline control plane installation, submit some workloads and simulate a spot instance termination.
Availability in spot-instance clusters 🔗︎
EKS or PKE clusters launched with Pipeline can mix in spot and on-demand instances (similarly, GKE can do this with preemptible instances). Such a cluster can be very volatile, instances — and therefore pods and deployments — may come and go, so it’s generally considered risky to run workloads or services on these types of clusters. Nevertheless, clusters started with Pipeline have some special fail-safes and custom features that help maintain high availability while still allowing users to benefit from the low cost of spot instances.
- Telescopes is used to recommend a diverse set of node pools. It helps decrease the chance of a large number of instances being interrupted at once by mixing instances across different spot markets.
- Cloudinfo supplies up-to-date service and price details.
- Deployments are scheduled in the cluster such that a configurable/fixed percentage of replicas are always on on-demand instances, so even if there’s a serious spot instance outage, deployments remain available with a reduced number of replicas. This is achieved through a custom scheduler, which takes node labels and specific pod annotations into account when running their predicates against nodes.
- Metrics relating to spot-related events are collected in Prometheus via Pipeline and through different exporters, like termination notices, fulfillment times and current market prices.
- Spot instance terminations are handled properly through Hollowtrees, which drains interrupted nodes and replaces them with nodes in safe spot markets.
- Spot instance scheduler (and webhooks)
Let’s take a look at how you can deploy your applications to a mixture of on-demand and spot instances, and how the automatic failover is triggered when spot instances are removed from the cluster. Before you start, you’ll need a Pipeline platform — and for that you have two options:
- use the free developer version of Pipeline, available at: https://try.pipeline.banzai.cloud
- install the Pipeline control plane for yourself, using the Banzai CLI on your prefered environment
For the sake of the demo, let’s try it by spinning up our Pipeline platform control plane on EC2.
Go grab the Banzai CLI with your preferred package (Debian, RPM, binary tarballs for Linux and macOS) and follow the installation guide.
❯ banzai pipeline init --provider ec2 --workspace demo ❯ echo " hollowtrees: enabled: true " >> ~/.banzai/pipeline/demo/values.yaml
Add your AWS secrets to the control plane 🔗︎
❯ banzai secret create --magic
Create an EKS cluster with Hollowtrees enabled 🔗︎
Now let’s create a cluster with our spot instance watchdog, Hollowtrees.
❯ banzai cluster create
Now let’s add a deployment, and configure the spread of its nodes accross spot and on-demand instances using the Pipeline UI.
Diversifying spot instances 🔗︎
We’ve been running thousands of K8s clusters on spot instances, and the generally accepted wisdom is that AWS removes spot instances due to (lack of) capacity, not due to price fluctuations. As highlighted at the beginning of this post, Telescopes recommends instance types based on resource needs, and can recommend similar instance types — whether on a totally different spot market or of a totally different flavour — to make sure that we meet and maintain the correct level of resources (set at cluster creation).
Test node termination 🔗︎
To test node termination choose one node, find the instance termination handler pod which runs on that node, port-forward its 8081 port and send a
PUT /terminate request to it:
❯ kubectl -n pipeline-system port-forward ith-instance-termination-handler-jwtqn 8081:8081
In another terminal:
❯ curl -X PUT http://localhost:8081/terminate
The test termination drains the node, cordons it, removes it from the ASG, but the instance itself must be terminated manually! Obviously, in the event there’s a
realspot termination, AWS will remove it for you.
Now let’s see what happens when we remove (send a termination notice to) a node. This visualization details how the node is cordoned and drained, while a new node is simultaneously being automatically provisioned. Once the new node has joined the cluster, the scheduler reschedules the pods.
For visualizing nodes/deployments we have used k8s-visualizer
Spot instances provide spare EC2 compute capacity at a discount of up to 80% when compared to on-demand prices, so it definitely makes sense to use them. The fail-safes and tools we have built into Pipeline make it so that you can begin taking advantage of these instances in the production environment, let alone in Dev and QA.
About Banzai Cloud Pipeline 🔗︎
Banzai Cloud’s Pipeline provides a platform for enterprises to develop, deploy, and scale container-based applications. It leverages best-of-breed cloud components, such as Kubernetes, to create a highly productive, yet flexible environment for developers and operations teams alike. Strong security measures — multiple authentication backends, fine-grained authorization, dynamic secret management, automated secure communications between components using TLS, vulnerability scans, static code analysis, CI/CD, and so on — are default features of the Pipeline platform.