Outshift Logo

PRODUCT

8 min read

Blog thumbnail
Published on 10/25/2020
Last updated on 03/21/2024

Istio configuration validation with Backyards

Share

If you're an active Istio user, then there's a good chance that Istio's configuration reference  is bookmarked in your browser, and that you've read the pages on VirtualServices, and ServiceEntries over and over, but still have to struggle to set up even simple configurations in your mesh. Istio's custom resource configuration is very powerful and flexible, but infamous for being overly complex. At its best, its YAML consists of lists of lists, cross-references, conflicting fields, and wildcards. Even though Istio's maintainers are aware of this hyper-complexity, and - at least in the last few releases - have tried to bring user friendliness into focus, Istio still routinely strands us in quagmires of minutia and uncertainty. We're down to ~25 custom resources from ~50 a year ago, and some now have useful CLI features like istioctl analyze, but we feel that there's more to be done. That's why we've added our own validation subsystem to our service mesh platform, Backyards (now Cisco Service Mesh Manager). The Backyards service mesh platform maintains total compatibility with upstream Istio, but also extends its feature set, while avoiding lock-in through a new abstraction layer. A good example of this is its validation subsystem, which takes Istio's validation system to a whole new level. It does this by considering the cluster state, as a whole, rather than just Istio's configuration.
Backyards (now Cisco Service Mesh Manager) is Banzai Cloud's multi and hybrid-cloud enabled service mesh platform for constructing and observing modern infrastructure. It is an Istio distribution and an SRE toolbox in one neat package that takes you from constructing your service mesh to forming SLOs against Envoy produced metrics.

Istio configuration validation in Backyards

Validation results can be seen on the Overview page of the UI: Istio configuration validation in Backyards Validation can be also checked from the CLI tool, as follows:
❯ backyards analyze
✓ 0 validation errors found
The UI displays the relevant parts of the configuration for each error that is detected, wherever that is applicable: Backyards_Overview

Validation examples

Backyards performs a lot of validation checks for various aspects of the configuration, both syntactically and semantically. The validation checks are constantly curated and new checks added with every release. A few examples will be presented in this post to show how helpful this feature is.

Sidecar injection template validation

This check validates whether there are any pods within the environment that runs with an outdated sidecar proxy image or configuration. In this example the global configuration setting of the sidecar proxy image was changed from banzaicloud/istio-proxyv2:1.7.3-bzc to banzaicloud/istio-proxyv2:1.7.3-bzc.1.
❯ backyards analyze --namespace backyards-demo
pod backyards-demo/frontpage-v1-8f9d69c97-phv4k:
    Cluster: master
    Error: sidecar injector proxy image mismatch
        Control Plane: cp-v17x.istio-system
        Error ID: pod/sidecar-check/sidecar/proxy-image-mismatch
        Context:
            podImage: banzaicloud/istio-proxyv2:1.7.3-bzc
            configImage: banzaicloud/istio-proxyv2:1.7.3-bzc.1
...
...
✗ 4 validation errors were found
This helps operators to get information about outdated proxies within the environment.

Gateway port protocol configuration conflict validation

This example demonstrates a check for the common mistake of setting conflicting port configuration in different Gateway resources, which won't be denied by Istio's built-in validation, but can cause unwanted behavior at ingress. The 9443 port for the same ingress gateway has been set to TCP in one resource, and set to TLS in another. The following YAMLs were applied:
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-port-conflict-01
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
    - hosts:
        - demo1.example.com
      port:
        name: tcp
        number: 9443
        protocol: TCP
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-port-conflict-02
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
    - hosts:
        - demo2.example.com
      port:
        name: tls
        number: 9443
        protocol: TLS
      tls:
        serverCertificate: /certs/cert.pem
        privateKey: /certs/key.pem
        mode: SIMPLE
Check the configuration's validity by running the CLI tool's analyze command.
❯ build/backyards-cli analyze --namespace istio-system
gateway istio-system/demo-gw-port-conflict-01:
    Cluster: master
    Error: Conflicting gateway port protocols
        Control Plane: cp-v17x.istio-system
        Error ID: gateway/port/gateway/port/protocol-conflict
        Path: servers[0]
        Context:
            port: 9443
            protocol: TCP

gateway istio-system/demo-gw-port-conflict-02:
    Cluster: master
    Error: Conflicting gateway port protocols
        Control Plane: cp-v17x.istio-system
        Error ID: gateway/port/gateway/port/protocol-conflict
        Path: servers[0]
        Context:
            port: 9443
            protocol: TLS

✗ 2 validation errors found
This result shows the issue exactly, and provides all the information necessary for the operator to quickly pinpoint the problem in the configuration.

Multiple gateways with the same TLS certificate validation

Configuring more than one gateway, using the same TLS certificate, will cause browsers that leverage HTTP/2 connection reuse (i.e., most browsers) to produce 404 errors when accessing a second host after a connection to another host has already been established.
You can read more about this issue in the Istio docs.
Let's apply the following resources to demonstrate how this issue works:
apiVersion: istio.banzaicloud.io/v1beta1
kind: MeshGateway
metadata:
  labels:
    app: demo-gw
  name: demo-gw
  namespace: istio-system
spec:
  labels:
    app: demo-gw
  maxReplicas: 1
  minReplicas: 1
  ports:
    - name: http2
      port: 80
      protocol: TCP
      targetPort: 8080
    - name: https
      port: 443
      protocol: TCP
      targetPort: 8443
  replicaCount: 1
  runAsRoot: true
  serviceType: LoadBalancer
  type: ingress
---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: example-wildcard-cert
  namespace: istio-system
spec:
  secretName: example-wildcard-cert
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  commonName: "test wildcard certifcate"
  isCA: false
  keySize: 2048
  keyAlgorithm: rsa
  keyEncoding: pkcs1
  usages:
    - server auth
  dnsNames:
    - "*.example.com"
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-tls-conflict-01
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
    - hosts:
        - demo1.example.com
      port:
        name: https
        number: 443
        protocol: HTTPS
      tls:
        credentialName: example-wildcard-cert
        httpsRedirect: false
        mode: SIMPLE
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-tls-conflict-02
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
    - hosts:
        - demo2.example.com
      port:
        name: https
        number: 443
        protocol: HTTPS
      tls:
        credentialName: example-wildcard-cert
        httpsRedirect: false
        mode: SIMPLE
The following resources were created:
  • an ingress gateway
  • an *.example.com wildcard certificate
  • two Gateway resources, both of which specify the same wildcard cert
Check the configuration's validity by running the analyze command in the CLI tool.
❯ backyards analyze --namespace istio-system
gateway istio-system/demo-gw-demo1:
    Cluster: master
    Error: multiple gateways configured with same TLS certificate
        Control Plane: cp-v17x.istio-system
        Error ID: gateway/reused-cert/gateway/reused-cert
        Path: port[443]
        Context:
            reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert

gateway istio-system/demo-gw-demo2:
    Cluster: master
    Error: multiple gateways configured with same TLS certificate
        Control Plane: cp-v17x.istio-system
        Error ID: gateway/reused-cert/gateway/reused-cert
        Path: port[443]
        Context:
            reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert

✗ 2 validation errors were found
The analyze command can also produce JSON output.
❯ backyards analyze --namespace istio-system -o json
{
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo1": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v17x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo1",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ],
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo2": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v17x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo2",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ]
}

Future plans

With Backyards (now Cisco Service Mesh Manager), you can already identify numerous configuration issues with today's validation, but we're planning to take this functionality a step further. It would be great to catch misconfigurations before they were applied to the cluster, and not after.
Tip of the day: You can simply download the Backyards CLI tool and then run backyards analyze with KUBECONFIG set for your cluster to detect if there are any validation issues on your cluster. Please note, that only evaluation usage is allowed for free, contact us if you'd like to use Backyards in production.
Both from the UI and from the CLI tool, Backyards can manipulate the Istio configuration resources by setting traffic management rules, changing mutual TLS settings or restricting outbound configurations for Envoy proxies. When a user manipulates any of these Istio resources, the validations are run against the new hypothetical cluster state in which the manipulations would be applied, even before the Istio resources are actually changed on the cluster. If, in this state, any validation issue is present, then the user is notified and can cancel or modify the faulty modification that he/she was about to make. Another use case is in a GitOps workflow, where the Istio resources would be modified via a PR. In this case, in an automated job, the backyards analyze command can be run against the new hypothetical cluster state and, if any issues are discovered, then the job fails and even the PR merge can be prevented. With the implementation of these features, it will be possible to catch issues early, and Backyards users will be further protected from potential downtime and Istio misconfigurations.

Takeaway

This was just the tip of the iceberg. [Backyards'] validation subsystem provides lots of checks that result in faster root cause analysis and more stable operation of the service mesh.
Want to know more? Get in touch with us, or delve into the details of the latest release. Or just take a look at some of the Istio features that Backyards automates and simplifies for you, and which we've already blogged about.

Subscribe card background
Subscribe
Subscribe to
the Shift!

Get emerging insights on emerging technology straight to your inbox.

Unlocking Multi-Cloud Security: Panoptica's Graph-Based Approach

Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.

thumbnail
I
Subscribe
Subscribe
 to
the Shift
!
Get
emerging insights
on emerging technology straight to your inbox.

The Shift keeps you at the forefront of cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations that are shaping the future of technology.

Outshift Background