Progressive Delivery Solution for Cisco Calisti

Nishant Patil
Nishant Patil

Wednesday, June 29th, 2022

Check out Cisco Calisti

Cisco Calisti is a managed Istio instance that brings deep observability, convenient management, tracing, and policy-based security to modern container-based applications. Cisco Calisti is a lifecycle management tool that saves your time by automating the adoption of Istio Service Mesh in your production environment and monitors your workload's health for resiliency and high availability.

The services that we deploy in Kubernetes-based cloud environments are susceptible to changes in new versions. And, when new versions of workloads are introduced in production environments the result may not be good enough to serve it in a real traffic load scenario. This is why we need to have a protocol/steps to make sure that new workload versions are good enough in production environments.

For this reason, Progressive delivery tools like Flagger and Argo Rollouts are useful to perform pre rollout tests and fallback if new versions are not up to required conformity.

This blog shows you how you can integrate Flagger with Cisco Calisti in production and leverage version rollout techniques so that your cloud environment is risk-free from bugs introduced with new version rollouts.

Flagger

Flagger is a progressive delivery toolkit that helps in automating the release process on Kubernetes. It reduces the risk of new software versions on production by gradually shifting traffic to the new version while measuring traffic metrics and running rollout tests.

Flagger can run automated application testing for the following deployment strategies:

  • Canary (progressive traffic shifting)
  • A/B testing (HTTP headers and cookie traffic routing)
  • Blue/Green (Traffic switching mirroring)

Along with this, Flagger also integrates with your messaging services like slack or MS Teams to alert you with flagger reports.

The following example shows how to integrate Flagger with Cisco Service Mesh Manager to observe Progressive delivery on the Cisco Service Mesh Manager dashboard, create a canary resource and observe progressive delivery in action.

The below image is an illustration of how canary images are rolled out with gradual traffic shifting of live traffic without interrupting users experience. Istio canary deployment

To demonstrate this, we will configure and deploy podinfo application for Blue/Green traffic mirror testing, upgrade its version and watch the Canary release on the Cisco Service Mesh Manager dashboard.

Before proceeding with this article example make sure you have installed Cisco Calisti Free/Paid version on your Kubernetes cluster.

For this example we will use Calisti SMM version 1.9.1 on k8s v1.21.0

Setting up Flagger with Cisco Calisti

  1. Deploy Flagger into the smm-system namespace and connect it to Istio and Prometheus at the address as shown in the following command:

    Note: Prometheus metrics service is hosted at

    http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus

    kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
    helm repo add flagger https://flagger.app
    helm upgrade -i flagger flagger/flagger \
    --namespace=smm-system \
    --set crd.create=false \
    --set meshProvider=istio \
    --set metricsServer=http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus
    

    This step installs custom resources as below

    1. canaries.flagger.app
    2. metrictemplates.flagger.app
    3. alertproviders.flagger.app
  2. Make sure you see the following log message for successful flagger operator deployment in your cluster:

    kubectl -n smm-system logs deployment/flagger
    

    Expected output:

    {"level":"info","ts":"2022-01-25T19:45:02.333Z","caller":"flagger/main.go:200","msg":"Connected to metrics server http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus"}
    

At this point flagger is integrated with Cisco Calisti. Users can now deploy their own applications to be used for Progressive Delivery.

Podinfo example with Flagger

Next let's try out an example from Flagger docs

  1. Create "test" namespace and enable "sidecar-proxy auto-inject on" for this namespace (use smm binary downloaded from SMM download page). Deploy the "podinfo" target image that needs to be enabled for canary deployment for load testing during automated canary promotion:

    kubectl create ns test
    smm sidecar-proxy auto-inject on test
    kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo
    
  2. Create IstioMeshGateway service

    kubectl apply -f - << EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioMeshGateway
    metadata:
      annotations:
        banzaicloud.io/related-to: istio-system/cp-v112x
      labels:
        app: test-imgw-app
        istio.io/rev: cp-v112x.istio-system
      name: test-imgw
      namespace: test
    spec:
      deployment:
        podMetadata:
          labels:
            app: test-imgw-app
            istio: ingressgateway
      istioControlPlane:
        name: cp-v112x
        namespace: istio-system
      service:
        ports:
          - name: http
            port: 80
            protocol: TCP
            targetPort: 8080
        type: LoadBalancer
      type: ingress
    EOF
    
  3. Add Port and Hosts for IstioMeshGateway using below gateway config.

    kubectl apply -f - << EOF
    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: public-gateway
      namespace: test
    spec:
      selector:
        app: test-imgw-app
        gateway-name: test-imgw
        gateway-type: ingress
        istio.io/rev: cp-v112x.istio-system
      servers:
        - port:
            number: 80
            name: http
            protocol: HTTP
          hosts:
            - "*"
    EOF
    
  4. Create a Canary custom resource.

    kubectl apply -f - << EOF
    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: podinfo
      namespace: test
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      progressDeadlineSeconds: 60
      autoscalerRef:
        apiVersion: autoscaling/v2beta2
        kind: HorizontalPodAutoscaler
        name: podinfo
      service:
        port: 9898
        targetPort: 9898
        gateways:
        - public-gateway
        hosts:
        - "*"
        trafficPolicy:
          tls:
            mode: DISABLE
        rewrite:
          uri: /
        retries:
          attempts: 3
          perTryTimeout: 1s
          retryOn: "gateway-error,connect-failure,refused-stream"
      analysis:
        interval: 30s
        threshold: 3
        maxWeight: 80
        stepWeight: 20
        metrics:
          - name: request-success-rate
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            thresholdRange:
              max: 500
            interval: 30s
    EOF
    

    At this step, Canary resource auto initializes for canary deployment by setting up below resources for podinfo in test namespace

    • Deployment and HorizontalPodAutoscaler for podinfo-primary.test
    • Services for podinfo-canary.test and podinfo-primary.test
    • DestinationRule for podinfo-canary.test and podinfo-primary.test
    • VirtualService for podinfo.test
  1. Wait until Flagger to initialize the deployment and sets up VirtualService for podinfo.

    kubectl -n smm-system logs deployment/flagger -f
    

    Expected:

    {"level":"info","ts":"2022-01-25T19:54:42.528Z","caller":"controller/events.go:33","msg":"Initialization done! podinfo.test","canary":"podinfo.test"}
    

    Get Ingress IP from IstioMeshGateway

    export INGRESS_IP=$(kubectl get istiomeshgateways.servicemesh.cisco.com -n test test-imgw -o jsonpath='{.status.GatewayAddress[0]}')
    echo $INGRESS_IP
    > 34.82.47.210
    

    Verify if podinfo is reachable from external IP address,

    curl http://$INGRESS_IP/
    {
      "hostname": "podinfo-96c5c65f6-l7ngc",
      "version": "6.0.0",
      "revision": "",
      "color": "#34577c",
      "logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif",
      "message": "greetings from podinfo v6.0.0",
      "goos": "linux",
      "goarch": "amd64",
      "runtime": "go1.16.5",
      "num_goroutine": "8",
      "num_cpu": "4"
    }
    
  2. Send traffic For this setup we will use hey traffic generator, you can install this from brew pkg manager

    brew install hey
    

    Let's send traffic from any terminal where the IP address is reachable. This cmd sends curl requests for 30 mins with two threads each with 10 requests per second

    hey -z 30m -q 10 -c 2 http://$INGRESS_IP/
    

    On the Cisco Calisti dashboard, select MENU > TOPOLOGY, and select the test namespace to see the generated traffic.

    Image of podinfo traffic

Upgrade Image version

Current pod version is v6.0.0, lets update it to next version.

  1. Upgrade the target image with new version and watch the canary functionality on the Cisco Calisti dashboard.

    kubectl -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:6.1.0
    > deployment.apps/podinfo image updated
    

    You can check flagger logs as the tests progresses and promotes the new version:

    {"msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"}
    {"msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 20","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 40","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 60","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 80","canary":"podinfo.test"}
    {"msg":"Copying podinfo.test template spec to podinfo-primary.test","canary":"podinfo.test"}
    {"msg":"HorizontalPodAutoscaler podinfo-primary.test updated","canary":"podinfo.test"}
    {"msg":"Routing all traffic to primary","canary":"podinfo.test"}
    {"msg":"Promotion completed! Scaling down podinfo.test","canary":"podinfo.test"}
    

    Check Canaries status:

    kubectl get canaries -n test -o wide
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Initializing   0        0              30s                 20                         80          2022-04-11T21:25:31Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Initialized    0        0              30s                 20                         80          2022-04-11T21:26:03Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing    0        0              30s                 20                         80          2022-04-11T21:33:03Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Succeeded      0        0              30s                 20                         80          2022-04-11T21:35:28Z
    

    Visualize the entire progressive delivery through Cisco Calisti Dashboard.

    Traffic from "TEST-IMGW-APP" is shifted from "podinfo-primary" to "podinfo-canary" from 20% to 80% (according to the step we configured for canary rollouts) Below image show the incoming traffic on "podinfo-primary" pod Image of primary podinfo traffic

    Below image show the incoming traffic on "podinfo-canary" pod Image of canary podinfo traffic

We can see that flagger dynamically shifts the ingress traffic to canary deployment in steps and performs conformity tests. Once the tests pass, flagger shift the traffic back to primary deployment and updates the version of primary to new version.

Finally, Flagger scales down podinfo:6.0.0 and shifts the traffic to podinfo:6.1.0 and makes it a primary deployment.

Below image you can check that the canary-image(v6.1.0) was tagged as primary-image(v6.1.0) Image of canary and podinfo traffic

Automated rollback

If you would like to test automated rollback if a canary fails, generate status 500 and delay by running the following command on the tester pod, then watch how the Canary release fails.

watch "curl -s http://$INGRESS_IP/delay/1 && curl -s http://$INGRESS_IP/status/500"
❯ kubectl get canaries -n test -o wide
NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
podinfo   Progressing   60       1              30s                 20                         80          2022-04-11T22:10:33Z
..
NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
podinfo   Progressing   60       1              30s                 20                         80          2022-04-11T22:10:33Z
..
NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
podinfo   Progressing   60       2              30s                 20                         80          2022-04-11T22:11:03Z
..
NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
podinfo   Progressing   60       3              30s                 20                         80          2022-04-11T22:11:33Z
..
NAME      STATUS   WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
podinfo   Failed   0        0              30s                 20                         80          2022-04-11T22:12:03Z
{"msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"}
{"msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"msg":"Advance podinfo.test canary weight 20","canary":"podinfo.test"}
{"msg":"Advance podinfo.test canary weight 40","canary":"podinfo.test"}
{"msg":"Advance podinfo.test canary weight 60","canary":"podinfo.test"}
{"msg":"Halt podinfo.test advancement request duration 917ms > 500ms","canary":"podinfo.test"}
{"msg":"Halt podinfo.test advancement request duration 598ms > 500ms","canary":"podinfo.test"}
{"msg":"Halt podinfo.test advancement request duration 1.543s > 500ms","canary":"podinfo.test"}
{"msg":"Rolling back podinfo.test failed checks threshold reached 3","canary":"podinfo.test"}
{"msg":"Canary failed! Scaling down podinfo.test","canary":"podinfo.test"}

Visualize the canary rollout through Cisco Calisti Dashboard.

When the rollout steps from 0% -> 20% -> 40% -> 60%, we can observe that performance degrade for incoming requests were > 500ms due to which image rollout was halted. Threshold was set to max 3 attempts, so after trying out for three times, rollout was backed off.

Below image shows "primary-pod" incoming traffic graph Image of podinfo traffic

Below image shows "canary-pod" incoming traffic graph Image of podinfo traffic

Below image shows status of pod health Image of podinfo traffic

Cleaning up

To clean up your cluster, run the following commands.

  1. Remove the Gateway and Canary CRs.

    kubectl delete -n test canaries.flagger.app podinfo
    kubectl delete -n test gateways.networking.istio.io public-gateway
    kubectl delete -n test istiomeshgateways.servicemesh.cisco.com test-imgw
    kubectl delete -n test deployment podinfo
    
  2. Delete the "test" namespace.

    kubectl delete namespace test
    
  3. Uninstall the Flagger deployment and delete canary CRD resource.

    helm delete flagger -n smm-system
    kubectl delete -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
    

Argo Rollouts

Argo Rollouts is a standalone extension of Argo CI/CD pipeline. Argo Rollouts provides similar features as Flagger and is closely integrated with the Argo CI/CD pipeline, that comes with advanced deployment capabilities on Kubernetes controller and set of CRDs and provides manual promotion and automated progressive delivery features.

Similar to Flagger, Argo Rollouts integrates with ingress controller - Istio and Cisco Service Mesh leverages the traffic shaping capabilities to gradually shift traffic to the new version during an update and perform conformity tests.

If you would like to have Cisco Calisti with Argo Rollouts to be integrated, we can achieve this in a few simple steps.

Setting up Argo Rollout with Cisco Calisti

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

Verify that Istio is detected on argo-rollouts pod logs

time="2022-04-12T16:56:40Z" level=info msg="Istio detected"
time="2022-04-12T16:56:40Z" level=info msg="Istio workers (10) started"

At this point, you have integrated Cisco Calisti with Argo Rollouts. Users can deploy their applications for Progressive Delivery. Cisco Calisti can help you with Lifecycle management that includes visualization of Istio traffic and monitoring workload/service health. Follow the Metrics template and Traffic Management on Argo Rollouts documentation to deploy custom features and fine tune operations to your requirements.

Metrics Analysis

If you would like to perform Automated rollout and rollbacks, use the Analysis template with Prometheus Address:

kind: AnalysisTemplate
spec:
  metrics:
  ...
    provider:
      prometheus:
        address: http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus

More info: Analysis template

Traffic Management

Argo Rollouts uses standard Istio CRD - VirtualServices and DestinationRule, and Kubernetes CRD - Services to manage traffic, hence no additional configuration is needed.

More info: Argo Rollouts - Istio

Conclusion

As you can see from this blog post, integrating progressive delivery tools like Flagger and Argo Rollouts to your service mesh allows you to improve the reliability of your services using version rollout techniques. If you'd like to try them on your own clusters, just register for the free version of Cisco Calisti.

Check out Cisco Calisti