A while ago we published some benchmarks and sizing about our experience of running Apache Kafka over a service mesh with Koperator and Istio operator, orchestrated by our automated and operationalized service mesh, Backyards (now Cisco Service Mesh Manager).
The reasons for such a setup were many, and there are more details in the Running Apache Kafka over Istio – benchmark post, but let me recap some of our initial reasons, and how we evolved from there.
While these were already good enough reasons, things changed quite fast since we published the benchmarks. The Envoy community has merged the Kafka protocol 2.0 codec, so instead of treating Kafka traffic as TCP, Envoy can now understand Kafka semantics at the protocol level. While this PR was essential, some other important parts of the puzzle were still missing, like Envoy’s Kafka protocol filter.
Check out Supertubes in action on your own clusters:
Register for an evaluation version and run a simple install command!
As you might know, Cisco has recently acquired Banzai Cloud. Currently we are in a transitional period and are moving our infrastructure. Contact us so we can discuss your needs and requirements, and organize a live demo.
Evaluation downloads are temporarily suspended. Contact us to discuss your needs and requirements, and organize a live demo.
supertubes install -a --no-demo-cluster --kubeconfig <path-to-k8s-cluster-kubeconfig-file>
or read the documentation for details.
- Oh no! Yet another Kafka operator for Kubernetes
- Monitor and operate Kafka based on Prometheus metrics
- Kafka rack awareness on Kubernetes
- Running Apache Kafka over Istio – benchmark
- User authenticated and access controlled clusters with [Koperator]
- Kafka rolling upgrade and dynamic configuration on Kubernetes
- Envoy protocol filter for Kafka, meshed
- Right-sizing Kafka clusters on Kubernetes
- Kafka disaster recovery on Kubernetes with CSI
- Kafka disaster recovery on Kubernetes using MirrorMaker2
- The benefits of integrating Apache Kafka with Istio
- Kafka ACLs on Kubernetes over Istio mTLS
- Declarative deployment of Apache Kafka on Kubernetes
- Bringing Kafka ACLs to Kubernetes the declarative way
- Kafka Schema Registry on Kubernetes the declarative way
- Announcing Supertubes 1.0, with Kafka Connect and dashboard
Envoy is a next generation network proxy, built for the cloud native era. It supports a wide variety of application protocols (Zookeeeper, MongoDB, etc) and recently added Kafka support. The benefits of a network proxy understanding higher level protocol implementations are huge. In case of Kafka, the list of benefits include:
Now let’s dig into some of the above.
Koperator has always provided server side metrics. But running in a Backyards (now Cisco Service Mesh Manager)-managed Istio service mesh also adds metrics from the Envoy sidecar. This opens up a totally new perspective. Without having to modify Kafka clients, we now have insights into clients and how they behave. For example, it’s easy to query which client is writing to a topic and what is the byte rate/client.
In Kafka, the client SDK is often responsible for too many things. The historical decision behind it, was to keep the brokers as lightweight and easy as possible. Initially Kafka was written in Scala, however with the later shift to Java, the full featured client SDKs are now the Java ones. The non JVM clients are missing quite a few features. With the help of Envoy, this will be different in the future, because some of the client responsibilities could be shifted into the sidecar proxy. This would bring the same functionalities to all clients no matter what language they’re written in.
As Kafka is content agnostic, misbehaving clients can write nearly anything to the brokers. The Envoy proxy can now validate the requests at the protocol level, and check if they contain all the required (or too many) information before forwarding it to the brokers.
The Kafka client SDK is a sensitive component. We’ve seen clusters that could not be upgraded in time, because clients were using older protocol versions. The Envoy filter can unwrap messages of older versions, and translate them to the latest and greatest version at the protocol level.
This is all nice and handy, but there’s still a missing piece: the Envoy protocol filter for Kafka. As mentioned earlier, the Envoy community and Adam Kotwasinski is working hard to finish it. We took Adam’s branch, built a custom Envoy version with the Kafka filter included, and automated a Kafka cluster setup on Istio, orchestrated by Backyards (now Cisco Service Mesh Manager). Under the hood the major components are:
The first prerequisite is to have a Kubernetes cluster.
If you have a cluster, you can grab this experimental build of the Backyards CLI.
This is an experimental feature, so make sure you download the appropriate release.
Set the KUBECONFIG environment variable to your Kubernetes cluster, and run the following two commands. It will install all the necessary components to try out the Envoy Kafka protocol filter.
backyards istio install --set spec.proxy.image=banzaicloud/proxyv2:devfilter backyards install --with-kafka-cluster
Backyards (now Cisco Service Mesh Manager) will install and configure an Istio service mesh, and an Apache Kafka cluster using Banzai Clouds Operators (Koperator and Istio). It will also configure the Envoy Kafka protocol filter with a custom resource called EnvoyFilter.
If you are more of a visual type, the following diagram represents the architecture:
To see some metrics, you will need some load in your Kafka cluster. You can use you own tooling to do that, or you can issue the following command which starts a small performance tool and sends some load to Kafka:
backyards kafka load
Then you can open the Grafana dashboard for the Kafka cluster:
backyards kafka dashboard
The sample dashboards show information about various Kafka protocol messages. The early version of the filter already produces some of the most important metrics, like the average latency of responses, the number of failed responses, or the number of topics.
These metrics can help you keep the cluster healthy. You can setup alerts based on these, that are triggered when something starts to behave incorrectly. For example, the Produce Buffer metric can tell you if the cluster is nearing its limits, so an intervention is needed.
On the other hand you can also use these metrics to build custom logic that helps you manage the cluster. For example you can leverage the Produce requests metric when setting up autoscaling of the Kafka cluster. Passing a certain threshold of the average response time could initiate an automatic Kafka cluster upscale.
Banzai Cloud is changing how private clouds are built: simplifying the development, deployment, and scaling of complex applications, and putting the power of Kubernetes and Cloud Native technologies in the hands of developers and enterprises, everywhere.