Outshift | Scaling Spark made simple on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes

Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive

Apache Kafka on Kubernetes series: Kafka on Kubernetes - using etcd

This is the second blog post in the Spark on Kubernetes series, so I hope you'll bear with me as I recap a few items of interest from our only previous one. Containerization and cluster management technologies are constantly evolving within the cluster computing community. Apache Spark currently supports Apache Hadoop YARN and Apache Mesos, in addition to offering its own standalone cluster manager. In 2014, Google announced the development of Kubernetes which has its own feature set and differentiates itself from YARN and Mesos. In a nutshell, Kubernetes is an open source container orchestration framework that can run on local machines, on-premise or in the cloud. It supports multiple container frameworks (docker, rkt and clear containers), which allows users to select whichever they prefer. Although support for standalone Spark clusters on k8s currently exists, there are still major advantages to, and significant interest in, native execution support due to the limitations of standalone mode and the advantages of Kubernetes. This is what inspired the spark-on-k8s project, which we at Banzai Cloud are also contributing to, while simultaneously building our PaaS, Pipeline. Let’s take a look at a simple example of how to upscale Spark automatically while it executes a basic SparkPI job that's running on EC2 instances.

NAME                                        STATUS    ROLES     AGE       VERSION
ip-10-0-100-40.eu-west-1.compute.internal   Ready     &lt;none&gt;    15m       v1.8.3

No pods are running except a shuffle service.

$  kubectl get po -o wide
NAME                                                              READY     STATUS    RESTARTS   AGE       IP          NODE
shuffle-lckjg                                                     1/1       Running   0          19s       10.46.0.1   ip-10-0-100-40.eu-west-1.compute.internal

Let’s submit SparkPi with dynamic allocation enabled, and pass 50000 as an input parameter in order to see what’s going on in the k8s cluster.

bin/spark-submit \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://https://34.242.4.60:443 \
  --kubernetes-namespace default \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.driver.cores=&quot;400m&quot; \
  --conf spark.dynamicAllocation.enabled=true \
  --conf spark.shuffle.service.enabled=true \
  --conf spark.kubernetes.shuffle.namespace=default \
  --conf spark.kubernetes.shuffle.labels=&quot;app=spark-shuffle-service,spark-version=2.2.0&quot; \
  --conf spark.local.dir=/tmp/spark-local \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=banzaicloud/spark-driver-py:v2.2.0-k8s-1.0.179 \
  --conf spark.kubernetes.executor.docker.image=banzaicloud/spark-executor-py:v2.2.0-k8s-1.0.179 \
  --conf spark.kubernetes.authenticate.submission.caCertFile=my-k8s-aws-ca.crt \
  --conf spark.kubernetes.authenticate.submission.clientKeyFile=my-k8s-aws-client.key \
  --conf spark.kubernetes.authenticate.submission.clientCertFile=my-k8s-aws-client.crt \
  local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar 50000

$ kubectl get po
NAME                                                              READY     STATUS    RESTARTS   AGE
spark-pi-1511535479452-driver                                     1/1       Running   0          2m
spark-pi-1511535479452-exec-1                                     0/1       Pending   0          38s

The driver is running inside the k8s cluster; so far so good. But, as we can see, only one executor is scheduled and it's not going anywhere. Let’s take a look at it:

$ kubectl describe po spark-pi-1511535479452-exec-1

...
...
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  11s (x8 over 1m)  default-scheduler  No nodes are available that match all of the predicates: Insufficient cpu (1), PodToleratesNodeTaints (1).

This warning tells us that there is no node in the k8s cluster with enough resources to run spark executors. After adding a new EC2 node and making more resources available in the k8s cluster executors start running, since now, with the addition of the new node, there are enough resources.

NAME                                                           READY    STATUS              RESTARTS   AGE
shuffle-lckjg                                                     1/1       Running             0          10m
shuffle-ncprp                                                     1/1       Running             0          47s
spark-pi-1511535479452-driver                                     1/1       Running             0          7m
spark-pi-1511535479452-exec-1                                     0/1       ContainerCreating   0          4m

NAME                                                              READY     STATUS    RESTARTS   AGE
shuffle-lckjg                                                     1/1       Running   0          11m
shuffle-ncprp                                                     1/1       Running   0          56s
spark-pi-1511535479452-driver                                     1/1       Running   0          7m
spark-pi-1511535479452-exec-1                                     1/1       Running   0          5m

All up-scaling is handled transparently, as k8s began to automatically execute pending executors as soon as the required resource became available. We didn’t have to do anything in Spark, no reconfiguration and no installation of components whatsoever. The above process still involves manual steps if you want to operate the cluster in a manual way - however, our end goal and the aim of our Apache Spark Spotguide is to automate the whole process. Pipeline uses Prometheus to extrernalize monitoring information from cluster infrastructure, the cloud and the running Spark application, itself, then wires that information back to Kubernetes to anticipate and make upscales/downscales as practical as possible while maintaining predefined SLA rules. As usual, we've open sourced all the Spark images and charts necessary to run Spark natively on Kubernetes, and made them available in our Banzai Cloud GitHub repository. In our next post in this series we'll discuss the internals of Spark on Kubernetes. For a brief preview, check out this sequence diagram that highlights the internal flow of events and illustrates how a Spark cluster works with/inside Kubernetes. Sure, it might not be completely straightforward now, but rest assured that after finishing this series, the details will have been made crystal clear.