Introduction to spotguides
Note: The Spotguides feature mentioned in this post is outdated and not available anymore. In case you are interested in a similar feature, contact us for details.
Last week we released the first version of Pipeline - a PaaS with end to end support for cloud native apps, from GitHub commit hooks deployed to the cloud in minutes to the use of a fully customizable CI/CD workflow.
At the core of the Pipeline PaaS are its spotguides - a collection of workflow/pipeline steps defined in a
.pipeline.yml file and a few Drone plugins. In this post we'd like to demystify spotguides and describe, step by step, how they work; the next post will be a tutorial on how to write a custom spotguide and its associated plugin.
From a distance each
spotguideis just a customizable CI/CD pipeline defined in a yaml file, a plugin written in
Golangand a Docker container that can be deployed/executed.
Note: The Pipeline CI/CD module mentioned in this post is outdated and not available anymore. You can integrate Pipeline to your CI/CD solution using the Pipeline API. Contact us for details.
Pipeline is an API and execution engine that provisions Kubernetes clusters and container engines in the cloud, deploys applications and is independent of the application/spotguide it deploys - the same way Kubernetes is. Any application that can be packaged as a Docker container and has a
manifest file or
helm chart can be deployed to a supported cloud provider or on-prem Kubernetes cluster and managed by Pipeline. This is true of applications like Apache Spark, Kafka and Zeppelin, but, at the same time, Pipeline is not tied to big data workloads (it's a generic microservice platform) and supports applications like Java (with cgroups) and distributed and resilient databases (exposing
postgres wire protocols as a service).
There are a few well defined out-of-the-box plugins that are already part of the Drone CI/CD component. Any complete list of those plugins would be quite large, but, just to highlight a few, some that we frequently use are:
- Docker - a plugin to build and publish Docker images to a container registry
- git - a plugin that clones git repositories
- s3 cache/sync - a plugin that caches build artifacts to S3 compatible storage backends like Minio or Rook, and syncs files with a bucket
- azure/google storage - a plugin for publishing files to Azure and Google blob storage
- dockerhub - a plugin to trigger a remote Docker Hub build
- slack - a plugin for Slack notifications
Most of these plugins require a credential or a keypair to access, and manage remote resources. The CI/CD system supports a convenient way to pass
secrets (like passwords and ssh keys), without the need to actually place them alongside a workflow definition and store them in GitHub. You can do this either by using the API or the CLI, or by passing them into the plugin at runtime like ENV variables, or, if you're running Kubernetes (like we do), through secrets or config maps.
Spotguides are application specific. The pipeline/workflow steps described in the
.pipeline.yml file reflect the typical lifecycle of the application, and are you usually unique. Needles to say, the CI/CD workflow/pipeline is fully customizable and supports parallel or conditional execution. Custom plugins sit at the core of any
spotguide. We've written custom plugins for our default supported apps; these plugins are extremely simple to build (they usually take 1-2 days) and have well defined interfaces. Variable injection, execution as a container, security, etc are all out of the realm of concern for a plugin's writer - these are default
services you already get from the CI/CD engine. By way of an example, take a look at our Apache Spark spotguide. This is how you get from a GitHub commit hook to a running Spark application on Kubernetes in minutes. The overall flow looks like this:
This flow translates to the following plugin flow:
The building blocks for the Spark
spotguide are as follows:
|Spark RSS Helm charts||https://github.com/banzaicloud/banzai-charts/tree/master/stable/spark-rss|
|Spark Shuffle Helm charts||https://github.com/banzaicloud/banzai-charts/tree/master/stable/spark-shuffle|
|Spark Helm charts||https://github.com/banzaicloud/banzai-charts/tree/master/stable/spark|
|K8S Proxy plugin||https://github.com/banzaicloud/drone-plugin-k8s-proxy|
|Spark K8S submit plugin||https://github.com/banzaicloud/drone-plugin-spark-submit-k8s|
|Pipeline client plugin||https://github.com/banzaicloud/drone-plugin-pipeline-client|
This combination of plugins written in
.pipeline.yml file and Kubernetes
deployment definitions (Helm charts in our case) composes a spotguide. As you can see, spotguides are application specific. However, the platform that deploys and governs them - Pipeline - is agnostic. This is an easy and powerful way to integrate any distributed application that can be containerized so it will run on our microservice PaaS. Pipeline creates and defines the runtime - which is Kubernetes - and deploys the application - which are described by Helm charts - through a REST API.
We use Helm charts to deploy and orchestrate the applications we deploy. In order to write a
spotguide, you'll need a Helm chart (or a low level deployment k8s unit like a manifest) and some orchestration logic (maybe). Take, for instance, one of the examples we deploy and use - a distributed database. Kubernetes does not differentiate between resources and priorities when deploying applications. Helm charts do have dependencies but there is no ordering. Because Helm 3.0 has so far not been released, we provide default
init containers for a predefined number of protocols to allow ordering and higher level readiness probes. Such basic ordering is a database startup; if you're deploying a simple web app with Pipeline that requires a database, it is deployed in parallel, however, the web app will fail until the database starts, is initiated and is ready to serve requests. These request failures show up in the logs, and trace and potentially trigger the default Prometheus alerts we deploy for the application. This is not ideal. But k8s does not currently have an out-of-the-box solution (at least not until Helm 3.0 is released). Thus, we provide protocol specific init containers that are able to serve startup orders, initialize applications and send readiness probes.
The final piece of this equation is the yaml file. The
pipeline.yml connects these components (except the upcoming UI and CLI) in a single unit, and describes workflow steps, defines the underlying plugins and their associated Helm charts. The yaml is pretty simple to read, maintain and execute. One added benefit is that, since all the steps above are containerized (plugins, for example), they can be used with other commercial CI/CD systems like CircleCI or Travis.
About Banzai Cloud Pipeline
Banzai Cloud’s Pipeline provides a platform for enterprises to develop, deploy, and scale container-based applications. It leverages best-of-breed cloud components, such as Kubernetes, to create a highly productive, yet flexible environment for developers and operations teams alike. Strong security measures — multiple authentication backends, fine-grained authorization, dynamic secret management, automated secure communications between components using TLS, vulnerability scans, static code analysis, CI/CD, and so on — are default features of the Pipeline platform.