Envoy protocol filter for Kafka, meshed
A while ago we published some benchmarks and sizing about our experience of running Apache Kafka over a service mesh with Koperator and Istio operator, orchestrated by our automated and operationalized service mesh, Backyards (now Cisco Service Mesh Manager).
The reasons for such a setup were many, and there are more details in the Running Apache Kafka over Istio - benchmark post, but let me recap some of our initial reasons, and how we evolved from there.
- Running Kafka over Istio does not add performance overhead (quite the opposite in case of mTLS)
- Out of the box support for multiple network topologies
- Resilience to network failures
- Observability and metrics based alerts and decisions
While these were already good enough reasons, things changed quite fast since we published the benchmarks. The Envoy community has merged the Kafka protocol 2.0 codec, so instead of treating Kafka traffic as TCP, Envoy can now understand Kafka semantics at the protocol level. While this PR was essential, some other important parts of the puzzle were still missing, like Envoy's Kafka protocol filter.
- The Envoy community and adamkotwasinski has been working on the Kafka protocol filter for Envoy
- The filter is almost ready (in Adam's fork) and now you can bring it on a test ride
- We built a custom Envoy version with the filter included
- We automated the Kafka setup on Istio, including the custom Envoy version
- Would you like to run Apache Kafka over Istio the easy way - try Supertubes.
Check out Supertubes in action on your own clusters:
Register for an evaluation version and run a simple install command!
As you might know, Cisco has recently acquired Banzai Cloud. Currently we are in a transitional period and are moving our infrastructure. Contact us so we can discuss your needs and requirements, and organize a live demo.
Evaluation downloads are temporarily suspended. Contact us to discuss your needs and requirements, and organize a live demo.
supertubes install -a --no-demo-cluster --kubeconfig <path-to-k8s-cluster-kubeconfig-file>
or read the documentation for details.
- Oh no! Yet another Kafka operator for Kubernetes
- Monitor and operate Kafka based on Prometheus metrics
- Kafka rack awareness on Kubernetes
- Running Apache Kafka over Istio - benchmark
- User authenticated and access controlled clusters with [Koperator]
- Kafka rolling upgrade and dynamic configuration on Kubernetes
- Envoy protocol filter for Kafka, meshed
- Right-sizing Kafka clusters on Kubernetes
- Kafka disaster recovery on Kubernetes with CSI
- Kafka disaster recovery on Kubernetes using MirrorMaker2
- The benefits of integrating Apache Kafka with Istio
- Kafka ACLs on Kubernetes over Istio mTLS
- Declarative deployment of Apache Kafka on Kubernetes
- Bringing Kafka ACLs to Kubernetes the declarative way
- Kafka Schema Registry on Kubernetes the declarative way
- Announcing Supertubes 1.0, with Kafka Connect and dashboard
Kafka protocol support in Envoy
Envoy is a next generation network proxy, built for the cloud native era. It supports a wide variety of application protocols (Zookeeeper, MongoDB, etc) and recently added Kafka support. The benefits of a network proxy understanding higher level protocol implementations are huge. In case of Kafka, the list of benefits include:
- Out of the box tracing and monitoring within a Kafka mesh
- Consumer group metrics
- Information about apps and their version of the client libraries
- Request validation
- Protocol version translations
- Automatic topic name conversions without having to modify the clients
- Mirroring topics to another clusters (we run many hybrid Kubernetes clusters)
- Functional parity across runtimes
Now let's dig into some of the above.
Metrics and monitoring
Koperator has always provided server side metrics. But running in a Backyards (now Cisco Service Mesh Manager)-managed Istio service mesh also adds metrics from the Envoy sidecar. This opens up a totally new perspective. Without having to modify Kafka clients, we now have insights into clients and how they behave. For example, it's easy to query which client is writing to a topic and what is the byte rate/client.
Functional parity across runtimes
In Kafka, the client SDK is often responsible for too many things. The historical decision behind it, was to keep the brokers as lightweight and easy as possible. Initially Kafka was written in Scala, however with the later shift to Java, the full featured client SDKs are now the Java ones. The non JVM clients are missing quite a few features. With the help of Envoy, this will be different in the future, because some of the client responsibilities could be shifted into the sidecar proxy. This would bring the same functionalities to all clients no matter what language they're written in.
As Kafka is content agnostic, misbehaving clients can write nearly anything to the brokers. The Envoy proxy can now validate the requests at the protocol level, and check if they contain all the required (or too many) information before forwarding it to the brokers.
Rewrapping old Kafka protocols
The Kafka client SDK is a sensitive component. We've seen clusters that could not be upgraded in time, because clients were using older protocol versions. The Envoy filter can unwrap messages of older versions, and translate them to the latest and greatest version at the protocol level.
Envoy protocol filter for Kafka in action
This is all nice and handy, but there's still a missing piece: the Envoy protocol filter for Kafka. As mentioned earlier, the Envoy community and Adam Kotwasinski is working hard to finish it. We took Adam's branch, built a custom Envoy version with the Kafka filter included, and automated a Kafka cluster setup on Istio, orchestrated by Backyards (now Cisco Service Mesh Manager). Under the hood the major components are:
- A custom Envoy build, available on this Docker hub repo
- The Banzai Cloud Istio operator
- Observability tools such as Prometheus, Jaeger and Grafana, installed by Backyards (now Cisco Service Mesh Manager)
- The Backyards CLI
Install a Kafka cluster on Istio
The first prerequisite is to have a Kubernetes cluster.
If you have a cluster, you can grab this experimental build of the Backyards CLI.
This is an experimental feature, so make sure you download the appropriate release.
KUBECONFIG environment variable to your Kubernetes
cluster, and run the following two commands. It will install
all the necessary components to try out the Envoy Kafka
backyards istio install --set spec.proxy.image=banzaicloud/proxyv2:devfilter backyards install --with-kafka-cluster
Backyards (now Cisco Service Mesh Manager)
will install and configure an Istio service mesh, and an
Apache Kafka cluster using Banzai Clouds Operators
It will also configure the Envoy Kafka protocol filter with
a custom resource called
If you are more of a visual type, the following diagram represents the architecture:
To see some metrics, you will need some load in your Kafka cluster. You can use you own tooling to do that, or you can issue the following command which starts a small performance tool and sends some load to Kafka:
backyards kafka load
Then you can open the Grafana dashboard for the Kafka cluster:
backyards kafka dashboard
Kafka protocol filter metrics
The sample dashboards show information about various Kafka protocol messages. The early version of the filter already produces some of the most important metrics, like the average latency of responses, the number of failed responses, or the number of topics.
These metrics can help you keep the cluster healthy. You can
setup alerts based on these, that are triggered when
something starts to behave incorrectly. For example, the
Produce Buffer metric can tell you if the cluster is
nearing its limits, so an intervention is needed.
On the other hand you can also use these metrics to build
custom logic that helps you manage the cluster. For example
you can leverage the
Produce requests metric when setting
up autoscaling of the Kafka cluster. Passing a certain
threshold of the average response time could initiate an
automatic Kafka cluster upscale.
About Banzai Cloud
Banzai Cloud is changing how private clouds are built: simplifying the development, deployment, and scaling of complex applications, and putting the power of Kubernetes and Cloud Native technologies in the hands of developers and enterprises, everywhere.