The Supertubes approach to handling Kafka ACLs in Kubernetes provides a clearer way of seeing what’s actually happening by introducing a logical separation of ACL components under the names:
That way we get reusable parts that help maintain the system in the long term, allowing us to handle ACLs with a declarative approach, and overcoming the difficulties inherent in handling ACLs in a Kubernetes environment.
The Why 🔗︎
We rely on Istio service mesh to be the foundation of our Supertubes Kafka clusters because it provides seamless security checks for traffic between Kafka components and clients from outside the cluster. It accomplishes this automatically, through mutual TLS authentication with builtin certificate rotation and management, and is actually faster than Kafka’s builtin TLS implementation.
For further benefits of running Kafka inside an Istio service mesh check our The benefits of integrating Apache Kafka with Istio blog post.
In an environment like this, handling Kafka ACLs can be difficult.
While you could use kafka-acls.sh, the traditional solution, over time it can be difficult to operate a Kafka cluster inside a Kubernetes cluster and keep the ACLs up to date.
When working in such an environment, there are two ways to set ACLs.
You can set them from outside of Kubernetes, meaning you will have to setup a certificate in order to maintain access to the cluster. Or you can set them from inside the cluster by executing into a pod that already contains
kafka-acls.sh, which isn’t ideal either from a usability perspective.
To complicate things, ACLs are difficult to follow. We wanted something that separates
- What somebody can access
- When somebody has access to it
- and Who the somebody is that we’re giving permissions to.
Before getting into the nitty-gritty, let’s take a look at the following example ACL configurations for the Kafka Schema Registry.
If we used
kafka-acls.sh, for instance, we would have to execute the following commands against our cluster.
bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf \ --add --allow-principal 'User:schema-registry' --allow-host '*' \ --producer --consumer --topic _schemas --group schema-registry bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf \ --add --allow-principal 'User:schema-registry' --allow-host '*' \ --operation DescribeConfigs --topic _schemas bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf \ --add --allow-principal 'User:schema-registry' --allow-host '*' \ --operation Describe --topic _schemas bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf \ --add --allow-principal 'User:schema-registry' --allow-host '*' \ --operation Read --topic _schemas bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf \ --add --allow-principal 'User:schema-registry' --allow-host '*' \ --operation Write --topic _schemas bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf \ --add --allow-principal 'User:schema-registry' --allow-host '*' \ --operation Describe --topic __consumer_offsets bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf \ --add --allow-principal 'User:schema-registry' --allow-host '*' \ --operation Create --cluster kafka-cluster
So what we are doing here exactly?
Quite a few things are happening here.
First, we have set both
producer access to the topic
_schemas and consumer group
schema-registry for the
User:schema-registry service principal. So far so good.
In the second through fifth commands, we set
Write permissions to the same topic.
Lastly, we set
Describe to the
__consumer_offsets topic, alongside the
Create operation for the kafka
As you can see, with some repetition configuring ACLs quickly becomes a relatively labor-intensive job, which is the perfect way to make typos and mistakes - something we absolutely do not want when talking about security.
Let’s take a look at how we would solve this problem.
We provide a way that is not just declarative - in line with the GitOps and the Configuration as Code trends of today - but is also easier to maintain, since we provide many reusable shortcuts in the form of
KafkaResourceSelectors - Beginning the What 🔗︎
Let’s start with figuring out what we’re trying to protect with the authorization.
This is something that’s getting used very frequently, over and over again. Just think about the example given above.
We used the topic
__schemas five times over the course of seven commands, and that was just one principal.
Elevating this topic into its own CR gives us the flexibility to reuse it, not just making our lives easier but making the solution more error resilient as well.
KafkaResourceSelectors are filters for one or more Kafka Resources of the same type. These types can be any of the following:
topicfor when you would like to apply them to Kafka topics
groupfor consumer groups
transactionalIdwhen you want to ensure a single writer
clusterwhen you want to impact the whole cluster
Here are some examples:
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KafkaResourceSelector metadata: name: schemas-topic namespace: kafka spec: type: topic name: _schemas pattern: literal
Here, we’re selecting the
_schemas. By saying that the pattern is
literal we’re making the determination that it should be an exact match for the
You can also use pattern
prefixed to suggest that the
name field act as a prefix, creating even more versatile selectors by grouping together multiple topics in the process simultaneously.
This is especially handy if you have a lot of smartly named topics, and you don’t want to create a selector for every single one of them.
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KafkaResourceSelector metadata: name: consumer-offsets-topic namespace: kafka spec: type: topic name: __consumer_offsets pattern: literal
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KafkaResourceSelector metadata: name: schema-registry-group namespace: kafka spec: type: group name: schema-registry pattern: literal
ResourceSelectors are a tool that help us to keep from repeating ourselves over and over again, every time we’d like to refer to a Kafka resource.
They also provide a centralized place to modify and track all our resources handled by ACL.
With a simple
kubectl get KafkaResourceSelector
command we can see how quickly which resource is covered by our ACLs, making it easy to spot if we’re missing something.
KafkaRoles - Providing the When 🔗︎
Roles have been part of access control systems for years. We don’t have to look very hard for an example, since Kubernetes RBAC works in practically the same way.
It builds on the common principle we discussed in
KafkaResourceSelectors, easing our job by encouraging reusability and helping us make the process more clear, followable, and therefore highly maintainable.
KafkaRoles provide a way to easily group multiple ACL permissions into one single reusable resource.
Let’s look at our examples:
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KafkaRole metadata: name: consumer spec: topic: # operations on topics operations: allow: - read - describe group: # operations on consumer groups operations: allow: - read
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KafkaRole metadata: name: producer spec: topic: operations: allow: - write - describe - create transactionalId: operations: allow: - write - describe
producer permissions are so frequently necessary that
kafka-acls.sh has built-in flags to handle them.
We wanted to take the next step and provide a way to create your own custom groups of ACL operations.
KafkaRoles listed above
producer are deployed by default when you install Supertubes, meaning they’re ready to use, and you do not have to manually apply them to the cluster.
spec you can specify your
deny permissions under the same four resource types listed in
Also, it’s important to note that roles are not mandatory. They are a reusable tool to ease headaches that might arise from overuse of copy-paste design patterns, saving you from a handful of bugs in the long run but also helping you keep a tight grip on access control to your Kafka cluster.
Imagine one day wanting to add a
read operation to all producers. Unlike with
kafka-acls.sh, you just have to add
read under the
producer role, and it will automatically get propagated to every
KafkaACL CR that references it.
So how does all this come together?
KafkaACL - Providing the Who 🔗︎
KafkaACL custom resource is at the heart of this system; it provides a binding between a subject and the two other components we’ve discussed,
Through it, you can tell a system which principal you would like to apply to the permissions defined in
and which kafka resource you’d like to grant access to through
Let’s take a look.
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KafkaACL metadata: name: schema-registry namespace: kafka spec: kind: User name: schema-registry clusterRef: name: kafka namespace: kafka acls: - topic: operations: allow: - read - write - describe - describe_configs resourceSelectors: - name : schemas-topic namespace: kafka - topic: operations: allow: - describe resourceSelectors: - name : consumer-offsets-topic namespace: kafka - cluster: allow: - create roles: - name: consumer resourceSelectors: - name: schemas-topic namespace: kafka - name: schema-registry-group namespace: kafka - name: producer resourceSelectors: - name: schemas-topic namespace: kafka
That’s a lot of YAML, but it’s really quite simple, so stay with me. If we go through
spec, the first thing we have to provide is the subject of the ACL.
kind: User name: schema-registry
Here, you need to give a
kind which is
User or anything else that the configured Kafka authorizer (see
authorizer.class.name kafka config for details) supports, then a
name, which should be the exact name of the service account we’d like to give the permissions to.
Wait, service account?
Yes. Basically, we authenticate Kafka clients using their Kubernetes namespaces and service accounts. The power of combining Kafka, Istio, and Kubernetes really shines through here, sprinkled with some WebAssembly Envoy filters. The best part is that you don’t actually need to know the minutiae of how this works, because Supertubes handles all of it behind the scenes for you. If that piqued your curiosity and you want to know more, check out our previous blog that goes into more detail.
Back to our example. The next thing you have to do is determine which
KafkaCluster you’d like to bind the ACL to.
A lot of our users use multiple Kafka clusters on a single Kubernetes cluster, which helps you separate ACLs from each other.
clusterRef: name: kafka namespace: kafka
Then comes the lion’s share of the YAML.
acls: - topic: operations: allow: - read - write - describe - describe_configs resourceSelectors: - name : schemas-topic namespace: kafka - topic: operations: allow: - describe resourceSelectors: - name : consumer-offsets-topic namespace: kafka - cluster: allow: - create
acls let you define inline permissions for your principal. Seem familiar? It’s the same way we define them in
If you remember, we said that
Roles are optional, and
acls is exactly the reason why.
KafkaRoles are reusable sections of
acls, so you can use them again somewhere else later, avoiding duplication in the process.
That being said,
acls are still very useful; here we’re using them in one-off operations, so that we don’t need to create
Roles with only one operation in them.
resourceSelectors we can provide a list of selectors, telling the system What resource we are trying to bind the operations to.
Note that the
cluster part is without a
resourceSelector. That’s because, in this case, the
clusterRef at the top of the CR is being used as an anchor for the operation.
Last but not least comes the optional part, providing a place where we can reference and use our
Roles are without a namespace, so you can reuse them across your other
roles: - name: consumer resourceSelectors: - name: schemas-topic namespace: kafka - name: producer resourceSelectors: - name: schemas-topic namespace: kafka
In the diagram below you can see how it all comes together.
producer roles are on the cluster and are ready to be used when you install Supertubes. The only thing you have to do is set the authorizer in your
readOnlyConfig: | authorizer.class.name=kafka.security.authorizer.AclAuthorizer allow.everyone.if.no.acl.found=false
and create the
KafkaACL and the
KafkaResourceSelector CRs. The latter, of course, is reusable,
and helps you out in the long run.
One thing that you might have noticed is that this solution provides no way of setting a
host field where you can specify the IP from which the principal can access resources. The reason for that is simple if you think about it.
In Kubernetes, pods do not have a permanent IP address; they move around the cluster constantly and in accordance with a variety of factors, mainly resource allocation quotas and what the Scheduler thinks is the best place in any given moment for the pod. But that’s great! One of the reasons we love Kubernetes is this kind of flexibility, and the fact that the vast majority of the time, we don’t even have to think about IPs in the cluster - making our job that much easier.
Filtering on the client IP address is also not that great an idea if the client comes from outside the cluster.
IPs can be spoofed, making it easy to walk around the problem.
So what can we do?
Istio provides many ways of denying a client’s access to the service mesh, in the process denying access to the Kafka cluster as well. The ultimate solution relies on certificates for authentication of the client. We already do that, but currently it only works the one way. We can give certificates to trusted clients, but Istio does not support revoking those certificates as of right now. Until Istio provides support for certificate revocation lists, we can instead, as a work around, set the expiration of a given certificate to as soon as possible and renew it only when required - a good security practice in its own right.
What’s next 🔗︎
The feedback we get from our customers and from the community is overwhelmingly positive. I’d like to thank all of you for reaching out to us and sharing your thoughts and helping to shape the future of Supertubes in ways we hadn’t even begun to think about. We are not done yet though, and are continuously improving and developing new features and capabilities. We always have new ideas that we’re eager to show you. Here is a little sneak peek of what’s next on our roadmap:
- Observability and management UI leveraging Istio telemetry data and the ability to drill down into the route cause of anomalies. Thanks to Istio we can provide data and telemetry about the state of the Kafka cluster that wasn’t previously possible.
- Envoy protocol filter-based audits as an extension of the Envoy Kafka protocol filter
About Supertubes 🔗︎
Banzai Cloud Supertubes (Supertubes) is the automation tool for setting up and operating production-ready Kafka clusters on Kubernetes, leveraging a Cloud-Native technology stack. Supertubes includes Zookeeper, the Banzai Cloud Kafka operator, Envoy, Istio and many other components that are installed, configured, and managed to operate a production-ready Kafka cluster on Kubernetes. Some of the key features are fine-grained broker configuration, scaling with rebalancing, graceful rolling upgrades, alert-based graceful scaling, monitoring, out-of-the-box mTLS with automatic certificate renewal, Kubernetes RBAC integration with Kafka ACLs, and multiple options for disaster recovery.