Managing ksqlDB with Supertubes
Ever since its inception, Supertubes has been about asking the question, "What if?" What if we built Kafka on the solid foundation of Istio and leveraged it and the Envoy proxy's powerful networking capabilities to combine the two into something even more powerful? Today, along with releasing ksqlDB support for Supertubes, we're engaging in even more speculation about what Istio and Kafka can do, together, to improve our lives.
Over the years, Kafka has grown itself into a whole platform with an entire ecosystem built around it, consisting of components like Kafka Connect, Kafka Schema Registry, and Mirror Maker 2. KsqlDB was one of the last things in the ecosystem to be supported by Supertubes, that is until today.
Of course, as always, we wanted to add the secret Banzai Cloud sauce to the recipe, and elevate it to the next level. In this case, that meant making it more secure with Istio, and taking some of the more tedious parts of ksqlDB management off our customer's shoulders.
So what's in the box?
In Supertubes, we made ACL's first-class citizens of Kubernetes. That should make a lot of sense, since we like to blur the line between Kafka, Istio and Kubernetes, making them into one big ecosystem that works together instead of against each other, and, more importantly, instead of against us. We tend to save a lot of frustration, time and money as part of this process.
It also helps to integrate Kafka more and more with our pre-existing, everyday Kubernetes tools like kubectl. Furthermore, with the help of our operators, we can deploy them declaratively from a CI/CD pipeline, and automate their handling in a dynamic way - which I'm sure most of you won't miss doing, in the slightest.
Assuming that you already have a Kubernetes cluster with the
on it, and have a working ksqlDB installation, all you have
to do is deploy a
KsqlDB custom resource to the cluster.
Fortunately, Supertubes CLI - which, in this case, provides
some validation - makes this easy to do.
supertubes cluster ksqldb create -f <path-to-ksqldb-cr-file> -n my-namespace
or just apply the custom resource file with a plain
Here's a simple example that will create a pre-configured ksqlDB cluster in interactive mode, that references pre-existing Schema Registry and Kafka Connect custom resources.
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KsqlDB metadata: name: ksqldb-sample spec: clusterRef: name: "my-kafka-cluster" schemaRegistryRef: name: "my-schema-regitry" kafkaConnectRef: name: "my-kafka-connect"
This will also create all the ACLs that are needed for both the interactive and headless modes of ksqlDB's internal operations, allowing ksqlDB to manage its record processing log topic, and to produce the command topic as well.
To allow the ksqlDB cluster to work on our input and output topics, we will first have to provide them in the Spec part of the custom resource.
Input topics are topics you want to read data from, and output topics are topics you want to write to - these can be newly created topics as well e.g. when you create a stream. If you want to chain together queries in a way that one query's output topic is another query's input topic, you have to provide them in both places.
Supertubes will then generate the required ACLs for ksqlDB to operate on them. It's easier if you know your input and output topics in advance when you operate your ksqlDB instance in headless mode - since the queries are already there when you start the instance - but we provide the same functionality in interactive mode, if you have some favorite topics you always want to read from.
--- Spec: inputTopics: - kafkacat-airports outputTopics: - AIRPORTS_ALL
Just scale it out
We take care of scaling ksqlDB using an
The twist here is that, by default, HPAs only support
scaling through basic CPU or memory usage, and, while that's
generally enough for most workloads, in the case of ksqlDB,
it's a much better idea to scale by
When ksqlDB cannot keep up with the rate of messages produced on your Kafka topics, it can fall behind in its processing of incoming data. Scaling by consumer lag will help solve this issue far better than scaling by any traditional metric. In the Supertubes ecosystem, we already track consumer lag in our Prometheus instance. You just have to enable HPA to understand these metrics by deploying the kube-metrics-adapter helm chart. An already deployed and configured HPA will do the rest for you.
Let's see another method by which Supertubes can improve our lives when it comes to ksqlDB security.
When it comes to securing ksqlDB, we have to distinguish between two modes: headless and interactive. Running ksqlDB in interactive mode is more complex than running ksqlDB in headless mode.
Securing ksqlDB running in interactive mode
In the case of interactive mode, ksqlDB enables REST API endpoints, which require additional configuring in order to make them secure.
The plain ksqlDB way
ksqlDB provides support for authenticating and encrypting client-server communication using HTTP Basic Authentication and TLS for RESTful and WebSocket endpoints. In order to enable encryption, you need to provide the following configuration parameters:
listeners=https://hostname:port ssl.keystore.location=/ssl/certs/keystore.jks ssl.keystore.password=supersecure
This configuration may not seem complex at first glance. However, importing new or renewed TLS certificates to the keystore makes it a bit cumbersome to maintain.
ksqlDB's built-in authentication uses a basic HTTP authentication mechanism. That means it can be configured to require users to authenticate using a username and password. Moreover, it provides role-base authorization by specifying which roles can access the server.
Configuring authentication requires a standard
which will define how the server authenticates the users.
The simplest example of this is when the
contains a path to a password file, which might look like
marty: delorean,user,admin mcfly: drbrown,user,developer
This same file will contain the role of the user, which is matched against a config value in the ksqlDB server. Connecting to the server from the client side requires that you provide a username and password:
bin/ksql --user marty --password delorean http://localhost:8088
We just rapidly went through how to configure ksqlDB security when interactive mode is enabled. Now, let's take a deep dive into how Supertubes does all of this configuring.
Securing ksqlDB with Supertubes
Supertubes integrates KsqlDB in a unique way, mostly by leveraging Istio. We tried to make our customers' lives a little easier by reducing the amount of time they had to deal with configuration. With Supertubes, ksqlDB is pre-configured to use mTLS and advanced authorization, out-of-the-box. And we ditched the whole HTTP-Basic authorization and replaced it with Istio's Authorization Policy.
states, auth policies can be used to enable access control
on workloads in the mesh. It supports both allow and deny
policies, to enable a fine grade of access control.
Moreover, it allows us to narrow down access control for
REST endpoints. Users can set up access to the endpoint
/info but, at the same time, can deny any
This table helps to clarify the differences between HTTP-Basic and Authorization Policies:
|HTTP-Basic||Istio's Authorization Policy|
|Works without certificate||Yes||No|
|Supports role based access||Yes||Yes|
|Fine grained access control||No||Yes|
|Requires client side configuration||Yes||No(only if app is outside of the mesh)|
Example Authorization Policy:
apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: ... namespace: ... spec: selector: matchLabels: app: ksqldb action: ALLOW rules: - from: - source: principals: ["cluster.local/<principal-id>"] to: - operation: methods: ["GET"] paths: ["/..."]
To authenticate the client, it uses the certificate's
SAN URI field, which, if in the form of an Istio generated
secret, takes the format of:
To allow such an application to connect to KsqlDB, the AuthorizationPolicy of the KsqlDB deployment would have to include:
action: ALLOW rules: - from: - source: principals: ["cluster.local/ns/<client-app-namespace>/sa/<client-app-service-account>"]
Client application types
We distinguish between three different client types based on where they run:
- Client applications which reside inside the same Istio mesh as the ksqlDB server
- Client applications which reside on the same Kubernetes cluster as the ksqlDB server but outside of the Istio mesh
- Client applications which reside outside of the Kubernetes cluster
We’ll look at each of these separately.
Clients running inside the mesh
When client applications that connect to ksqlDB run in the
same Istio mesh, they don’t need to send certificates. Istio
provides one for them, out-of-the-box. This certificate is
special, in that it carries information about the namespace
and the service account of the application. The
Authorization policy will collect the
SAN URI information
from it, and use that as the application's identity.
Client running outside of the mesh
When a client application connects from outside the Istio
mesh, the value of the
SAN URI field (extracted from the
certificate of the client application) is used as the
application’s identity. This certificate is special, in that
it must contain the required
SAN URI fields. This can be
either generated by a tool like CertManager or Vault, but a
KafkaUser resource will do the trick as well. As of now,
KafkaUser custom resources generate certificates which
include all the fields required to use ksqlDB with
Client running outside of the Kubernetes cluster
When the client application is external to the Kubernetes cluster, this flow differs from the above in that the traffic from the application passes through a LoadBalancer and Ingress gateway.
Securing ksqlDB running in headless mode
Using headless mode, ksqlDB will not initialize any REST endpoint. That means that securing the communication between components is enough. Since Supertubes uses and configures Istio behind the scenes, no additional changes are required, either on the ksqlDB-side or on any service it connects to. Thus, we get mTLS out of the box when we use ksqlDB with Supertubes.
Today we are glad to announce support for ksqlDB which means now all the major Kafka companion products shipped with Supertubes including Schema Registry, Kafka Connect and ksqlDB. Supertubes bundles Istio, Kubernetes and Kafka in a unique way which makes our customers life easier. We are not done yet though, and we plan to introduce additional improvements, features and capabilities extending the number of use cases where Supertubes can help. Stay tuned.