Two weeks ago we introduced our Kafka Spotguide for Kubernetes - the easiest way to deploy and operate Apache Kafka on Kubernetes. Since then, it’s been integrated into our application and DevOps container management platform, Pipeline, among other spotguides such as Spark on Kubernetes, Zeppelin, NodeJS and Golang, just to name a few.
Because we’ve already met our goal of making it easy set up a Kafka cluster on Kubernetes with just few clicks, and in less than ten minutes - provisioning and operating its entire infrastructure, both in Kubernetes and Kafka - we’ve shifted our focus to Kafka security.
The Pipeline platform enables easy enterprise grade security consumption; you can read more on how we tackle security through multiple layers and components, here, or read about the CIS Kubernetes benchmark we passed, here.
On a default Kafka installation, any user or application can write messages to
topics, as well as read data from
topics. Because Kafka is usually accessed by multiple applications or teams and/or the information flying through it is, confidential security is a must. While there are multiple ways of tackling this problem, cloud and Kubernetes based-environments bring an added level of complexity. This is exactly what the Banzai Cloud Pipeline platform makes simple and automates. Keep reading to learn about our method for securing Kafka on Kubernetes.
Kafka security - overview 🔗︎
Kafka security (or general security) can be broken down into three main areas. Documentation pertaining to Kafka security is available on the Apache Kafka site, but these are the high level topics one should go over when considering how best to secure Kafka:
- Authentication verifies the identity of consumers and producers using SASL or SSL
- Authorization sets authenticated identity ACLs, in order to check whether they can read/write from a particular broker
- Encryption uses TLS to encrypt in-flight data between consumers and producers
The Kafka documentation uses the term SSL when it actually means TLS. For consistency’s sake, we will use the term SSL, as well. However, what we mean to say is TLS.
Kafka security on Kubernetes 🔗︎
This post is not intended to be an exhaustive Kakfa security guideline, since there’s already a whole lot of documentation out there. In the following sections, we’ll discuss only those security options made available with the Kafka Spotguide.
Transport layer encryption 🔗︎
Messages routed towards, within, or out of a Kafka cluster are unencrypted by default. By enabling SSL support we can avoid man-in-the-middle attacks and securely transmit data over the network. The Banzai Cloud Pipeline Kafka spotguide allows users to chose between four strategies, then the Kafka spotguide does the rest:
- Internal - only internal broker communications are secured
- External - all connections coming from outside the cluster require SSL authentication, internal broker communication is in PLAINTEXT
- Internal/External - both internal or external communication are secured
- None - both internal and external communication are in PLAINTEXT
In the event someone chooses None, the widely popular (but equally insecure) gRPC and REST proxy for Kafka - Mailgun’s kafka-pixy - is installed. Unfortunately that proxy does not support encryption, thus it’s only available in this case.
The Banzai Cloud Pipeline platform generates the required certificates, but the user can still bring their own. As is usual for Pipeline, the certificates are stored in Vault and managed by our Vault operator for Kubernetes.
Kafka authentication 🔗︎
Kafka supports multiple auth options; our focus is currently on SASL/SCRAM support, or, to be more specific, SCRAM_SSL. SASL stands for Simple Authorization Service Layer but it’s not simple at all. No problem, we’ve automated everything. This approach comes to us from big data’s legacy - the idea being that authentication should be separated from the Kafka protocol, and username and password hashes should be stored in Zookeeper.
- SASL/SCRAM - is a username/password combination alongside a challenge (salt) and requires TLS encryption
When choosing this option, the Spotguide performs all the required changes, from configuring the brokers to accepting secure connections, to generating a JAAS file.
Kafka authorization 🔗︎
Once Kafka clients are authenticated, Kafka needs to be able to decide what they can or can’t do. Authorization is our friend in this case, controlled by Access Control Lists (ACL). The Kafka Spotguide adds a set of ACLs when configuring the brokers. There is an
admin user (which works only inside the cluster) with all the rights
super.users=User:admin necessary to create topics, ACLs, and to read/write on all topics. Another user (
username) is created to access topics (
spotguide-kafka topic) from outside of the cluster.
Note that we are using
authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer, however, this can always be changed in the broker config.
What’s next? 🔗︎
Our work doesn’t stop here. Some of our Kafka Spotguide users have been asking for additional features, while at the same time, there are limitations we’d like to address. These are the high level changes coming soon:
- Currently, there is only one admin user with a password for inner broker communications, and one configurable user with a password for external communication. Multiple user support is currently being tested, and will be released soon.
- Kafka does not support connections via SSL to Zookeeper but it does support SASL authentication, this feature is coming soon.
- Support for the widely popular Kafka UI has been added, though it works (by design) with full priviledges. This is already available via our Spotguide, but we’re going to work on making it more restrictive.
About Banzai Cloud Pipeline 🔗︎
Banzai Cloud’s Pipeline provides a platform for enterprises to develop, deploy, and scale container-based applications. It leverages best-of-breed cloud components, such as Kubernetes, to create a highly productive, yet flexible environment for developers and operations teams alike. Strong security measures — multiple authentication backends, fine-grained authorization, dynamic secret management, automated secure communications between components using TLS, vulnerability scans, static code analysis, CI/CD, and so on — are default features of the Pipeline platform.