Cisco Tech Blog CiscoTech Blog Close
Home Contact

Running multiple Kubernetes clusters in a company or even within a team is a common practice. You need test environments or just want to separate workloads between customers. Prometheus is an awesome tool to monitor a single cluster. But if you want to query multiple clusters, you need the help of other tools. If you are a regular reader, you know that our choice for this task is Thanos. If you are not familiar with Thanos, read our Multi cluster monitoring with Thanos blog post first.

Thanos Operator vs Helm chart πŸ”—︎

Using operators in a Kubernetes environment can be a huge operational benefit. There are always debates about using simple Helm charts, (or other deployment tools for templating Kubernetes yaml files) versus installing an operator and managing custom resources. Each of these have their pros and cons. In my opinion, if you are managing stateful applications, interacting with different components, and/or want changing configuration frequently, it’s nice to have an operator abstracting away configuration complexities. The following use-case uses our Thanos Operator and has already demonstrated the benefits of having a deterministic way for managing a multi-component software.

Note: To simplify the process we will use the one-eye command line tool. This tool is available throught Cisco’s Emerging Technologies Design Partner Program.

Multi cluster query πŸ”—︎

In this example we will setup Prometheus and Thanos to have a single-dashboard multi-query architecture. What does this mean? In short: you can grab metrics from an application no matter which cluster it is running on.

Allocate Ingress Allocate Ingress

Note: Although this use-case does not enable long-term storage, it should be trivial to configure that as well.

So let’s outline the steps we have to do to achieve this. To simplify the explanation, I’ll call the management cluster as Observer and all the other clusters as Peer clusters.

  • Install Prometheus, Thanos, Grafana, and Cert-manager on the Observer cluster
  • Install Prometheus and Thanos on a Peer cluster
  • Create an Ingress as we need to access the Peer cluster on a public endpoint
  • Configure mutual TLS authentication for this endpoint
  • Deploy a client with the appropriate TLS configuration on the Observer cluster
  • Create an aggregator Thanos Query on the Observer to aggregate the metrics of every cluster
  • Add this aggregator Query to Grafana as a datasource.

Note: This is one way to achieve this functionality. Other solutions - like using reverse-proxy for TLS configuration on the observer cluster - are also perfectly fine. However, we used the tooling we already had at hand via the Thanos Operator.

So let’s start!

Preparations πŸ”—︎

As we will use more than one cluster, we will use the --context switch to choose between them. Please take note of which commands we apply on which cluster.

Using contexts πŸ”—︎

This walkthrough assumes that you have a kubeconfig file (and the path to the file is defined in the KUBECONFIG environment variable) containing all contexts required to connect to the specific clusters. We will refer to the context names using the ${OBSERVER_CONTEXT} and ${PEER_CONTEXT} shell variables, and assume they have been exported like in the following example below.

export OBSERVER_CONTEXT="mcom-observer"
export PEER_CONTEXT="mcom-peer-1"

First, grab the name of the peer cluster. The snippet below sets the context name to the endpoint name.

kubectx "${PEER_CONTEXT}"
export PEER_ENDPOINT=$(kubectl config current-context | cut -d '@' -f 2)

Note: Depending on your context name, the delimiter might be different, so check the $PEER_ENDPOINT value.

Observer cluster prerequisites πŸ”—︎

After these preparations, deploy the components on the Observer cluster.

one-eye --context "${OBSERVER_CONTEXT}" cert-manager install -us
one-eye --context "${OBSERVER_CONTEXT}" prometheus install -us
one-eye --context "${OBSERVER_CONTEXT}" grafana install -us
one-eye --context "${OBSERVER_CONTEXT}" thanos install --operator-only -us
one-eye --context "${OBSERVER_CONTEXT}" observer reconcile

Note: To reduce the number of times we reconcile, we use the -s/--skip-reconcile and -u/--update flags to initialize the observer configuration. We will do an explicit reconcile at the end.

Certificates πŸ”—︎

If you have a way to create client and server certificates, you can skip this part. In this section we setup a self-signed certificate using cert-manager.

Note: Self-signed certificates are for demonstration purposes. Use a proper CA for production setup.

Certificates Certificates

The first step is to set up a self-signed issuer. You can do that by applying the following yaml:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned
  namespace: default
spec:
  selfSigned: {}

The next step is to generate certificates. For simplicity, we use the same certificate for client and server authentication. The following yaml creates the appropriate certificate at mcom-peer-1 secret.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: mcom-peer-1-tls
  namespace: default
spec:
  commonName: peer-endpoint.cluster.notld
  dnsNames:
  - "${PEER_ENDPOINT}"
  issuerRef:
    name: selfsigned
    secretName: ${PEER_ENDPOINT}-tls
  usages:
    - server auth
    - client auth

Peer cluster πŸ”—︎

After generating the proper certificates, we can jump over to the Peer cluster to prepare the environment there.

Prepare cluster label on Prometheus πŸ”—︎

If Prometheus is already installed on the peer cluster, make sure that it properly sets the cluster label for the collected metrics: the label must be present and unique, otherwise the metrics of the different clusters become mixed up. More over we need to enable the Thanos Sidecar for Prometheus.

If Multi Cloud Observability Manager (formerly called One Eye) has already been installed on the peer cluster, make sure that the spec.clusterName field of the observer custom resource is different on the Observer and the Peer clusters. Multi Cloud Observability Manager version 0.5.0 and later tries to detect it automatically from the context.

Copy the certificates to the Peer cluster:

kubectl --context "${OBSERVER_CONTEXT}" get secret "${PEER_ENDPOINT}-tls" -o yaml | kubectl --context "${PEER_CONTEXT}" create -f-

Prepare the Peer cluster for monitoring.

one-eye --context "${PEER_CONTEXT}" prometheus install -us
one-eye --context "${PEER_CONTEXT}" thanos install --operator-only -us
one-eye --context "${PEER_CONTEXT}" ingress install -us

one-eye --context "${PEER_CONTEXT}" observer reconcile

For ingress Cisco MCOM installs the official Kubernetes Nginx Ingress Controller. Since Thanos uses GRPC to communicate between components we need an ingress that can provide HTTP/2 support. Because most of HTTP/2 configuration is based on annotations Thanos Operator currently supports only Nginx Ingress but this can be exteneded in the future.

Allocate Ingress Allocate Ingress

Create the ThanosEndpoint on the Peer cluster. This command will perform different tasks. First, it will deploy a Thanos Query to provide an interface for the other components. After that, it deploys an Nginx ingress to create a GRPC endpoint with TLS configured. The following command first generates the yaml for the endpoint, then applies it.

one-eye thanos endpoint generate $PEER_ENDPOINT --cert-secret-name ${PEER_ENDPOINT}-tls --ca-bundle-secret-name ${PEER_ENDPOINT}-tls | kubectl apply -f-

Note: You can use the generate command to create yaml files that you can later use in your CI/CD environment as well.

Example ThanosEndpoint configuration

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ThanosEndpoint
metadata:
  name: mcom-peer-1
  namespace: default
spec:
  caBundle: mcom-peer-1-tls
  certificate: mcom-peer-1-tls
  ingressClassName: one-eye-nginx-external
  metaOverrides: {}

Allocate Ingress Allocate Ingress

After a successful reconcile, the ThanosEndpoint resource’s status holds the value of the ingress public endpoint. This endpoint is required to setup the Observer’s peer resource. Let’s export that into a variable.

$ kubectl get thanosendpoint
NAME          ENDPOINT ADDRESS
mcom-peer-1   xxxxxxxxxxxxxxxxxxxxx-zzzzzzzzzzzz.eu-west-1.elb.amazonaws.com:443

We can use the following one-liner to save the address in a variable.

export ENDPOINT_ADDRESS=$(one-eye --context "${PEER_CONTEXT}" thanos endpoint address "${PEER_ENDPOINT}")

Now it’s time to create the peer resource on the Observer cluster. We need to specify the endpoint address and the secret for the certificates. The following command will creates and configures a Thanos Query with TLS authentication that connects to the peer cluster. Moreover, it creates a datasource resource for the Grafana operator automatically.

one-eye --context "${OBSERVER_CONTEXT}" thanos peer generate "${PEER_ENDPOINT}" --endpoint-address "${ENDPOINT_ADDRESS}" --cert-secret-name "${PEER_ENDPOINT}-tls" --ca-bundle-secret-name "${PEER_ENDPOINT}-tls" | kubectl --context "${OBSERVER_CONTEXT}" apply -f-

Example ThanosPeer configuration

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ThanosPeer
metadata:
  name: mcom-peer-1
  namespace: default
spec:
  endpointAddress: xxxxxxxxxxxxxxxxxxxxx-zzzzzzzzzzzz.eu-west-1.elb.amazonaws.com:443
  peerEndpointAlias: mcom-peer-1
status:
  queryHTTPServiceURL: http://mcom-peer-1-peer-query.default.svc:10902

Allocate Ingress Allocate Ingress

The final step is to configure our Central Query instance that aggregates all of the configured peer queries. First, create an aggregator Query called central-query.

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
  labels:
    app.kubernetes.io/instance: central-query
    app.kubernetes.io/managed-by: thanos-operator
    app.kubernetes.io/name: query
  name: central-query
spec:
  queryDiscovery: true
  query:
    grafanaDatasource: true
    metrics:
      serviceMonitor: true

Then create a StoreEndpoint definition with an empty selector. This will aggregate all endpoints using the Thanos Store protocol. The thanos attribute references our previously created query instance.

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: StoreEndpoint
metadata:
  name: all-endpoint
spec:
  thanos: central-query
  selector: {}

Now we are all set. Just check the Grafana dashboard and query whatever you need!

Grafana Screenshot Grafana Screenshot

At first these steps may look like a lot, but installing, configuring, and then reconfiguring all components to harness the synergy between them can take a lot of time. Moreover, these steps are easy to automate! You can build a CD pipeline easily with these tools. Remember, Cisco Multi Cloud Observability Manager is both an Operator and a CLI tool, and they can work simultaneously. You can install the operator on every cluster and configure it via the CLI tool as a step of a delivery pipeline. Another benefit of this approach is that we only uses Kubernetes provided resources. There is no custom logic behind the service discovery nor hand configured proxies. These are standard Kubernetes resources that you would use for other applications as well.