In today’s rapidly evolving digital landscape, organizations are constantly seeking ways to leverage the power of cloud computing to drive innovation and accelerate their business growth. Cloud-native applications have emerged as the go-to approach for building scalable, flexible, and resilient software systems that can adapt to dynamic demands. At the heart of this transformative paradigm lies platform engineering, a crucial discipline that empowers organizations to harness the full potential of cloud-native architectures. Platform engineering for cloud-native applications involves designing, building, and managing the underlying infrastructure and services that enable the seamless deployment, scaling, and operation of modern software applications.
The following article walks through building a cloud-native platform which provides cloud-native infrastrucutre combined with observability using popular open sources, such as MongoDB, Kafka messaging, Elasticsearch, OpenTelemetry, MinIO and certificate management services etc. We can deploy this cloud-native platform in a few simple steps on any Kubernetes cluster, such as Amazon Elastic Kubernetes Service (Amazon EKS) or kind cluster.
The cloud-native platform also implements an interface layer using Argtor Kubernetes operator, for any application to access the cloud-native services, such as accessing MongoDB using custom resource definition (CRD). We can develop and test the application on one type of Kubernetes cluster, e.g. kind cluster on-premise. The application can be seemlessly deployed on another type of Kubernetes cluster, e.g. Amazon EKS, without any code changes.
In this example, we use Temporal workflow engine to deploy the platform in a kind cluster on-premise. Temporal is a distributed workflow manager that can manage tasks in an order & in a fault tolerant manner. Tasks in the workflow setup the basic platform services. There are alternatives to this approach like – Argo Workflows, Terraform or Ansible (each of them comes with certain pros and cons). In the end, we demonstrate how a golang microservice accesses the platform services.
Figure 1 shows the deployment overview. The Jenkins pipeline calls the Temporal rest API which initiates a series of workflow tasks. These tasks deploy platform services in the Kubernetes cluster in a fault tolerant manner and consistently across several clusters (handles retry logics, wait times for cluster bring up, distributing TLS certs etc…). As mentioned earlier, it is also worthwhile looking into other configuration management solutions as well. Our goal here is to deploy services in a consistent manner and effectively handle any failures.
Argtor is a Kubernetes operator that provides an interface, i.e. custom resource, to application developers. It implements the platform operations to create Kubernetes resources for the application to access the platform services, such as: MongoDB, Kafka messaging.
Argtor defines the following custom resource definition (CRD):
As shown in Figure 2 below, the platform provides the MongoDB database service.
The platform installs the MongoDB community operator. The operator deploys the MongoDB cluster in 3 pods.
For an application pod to access the MongoDB, the developer creates one CRD instance by specifying the application name, the service name and the database name. As shown in Figure 2, the database credential and a certificate are generated for the pod to access the database:
Figure 3 shows the Kafka messaging service.
The platform installs the Cisco AKO (Kafka Operator), with the option to install other kafka operator, such as: koperator or Strimzi . The operator deploys the kafka cluster with kafka security enabled.
For any appliction pod to access the kafka cluster, the developer creates an Argtor CRD instance with the following properties:
As shown in Figure 3, Argtor creates one CertificateRequest CRD instance on behalf of the application and the cert-manager generates the the SSL certificate for the pod to access the kafka cluster.
Figure 4 shows the OpenTelemetry pipeline deployed for the platform. In this example, spartan, aegis and proxy are application pods. Traces and metrics are collected and forwarded using loadbalancers. They are processed and forwarded to Grafana Tempo and Prometheus backends. Fluent Bit forwards the logs to Elasticsearch. Dashboard can be created using Grafana and Kibana.
By default, the platform enables Grafana service graph. The user can create custom dashborard as needed.
The platform deploys one OTEL sidecar mode opentelemetrycollectors. The application developer needs to put sidecar.opentelemetry.io/inject annotation in Kubernetes deployment specification. As shown below:
spec: template: metadata: annotations: sidecar.opentelemetry.io/inject: opentelemetry-operator-system/otel-agent-sidecar
The platform injects an OTEL sidecar into the pod. The sidecar forwards the metrics and traces to the loadbalancing OTEL collector. The application OpenTelemetry implementation sends the metrics and traces to localhost:4317 using gRPC, which is received by the OTEL sidecar.
The application pod needs to mount its log files to /var/log directory. The platform deploys Fluent Bit to parse logs under this directory and forward the logs to the Elasticsearch.
This section goes through steps to deploy a Kubernetes microservice, spartan, in the platform. spartan needs to read from and write to a MongoDB database and publish messages to a Kafka topic.
One Argtor Application CRD is created to deploy spartan. The CRD is created in the Kubernetes namespace: argtor-infra. The CRD name is: pontos-crd. The application name is pontos, which is deployed in the Kubernetes namespace: pontos. The spartan’s database name is Global-spartan.
The following is the CRD definition:
apiVersion: argtor.golang.cisco.com/v1 kind: Application metadata: name: pontos-crd namespace: argtor-infra spec: name: pontos namespace: name: pontos services: - spartan serviceDeployments: - name: spartan kafkaTopics: - numPartition: 3 replicationFactor: 1 name: spartan.argo.cisco.com.v1.Spartan-spartan-svc-topic configEntries: retention.ms: "86400000" database: Global-spartan
The platform generates the following Kubernetes secrets under Kubernetes namespace: pontos, under which the application services are deployed:
The following is part of the spartan deployment specification which is related to the platform services. Each Kubernetes secret is mounted to the spartan pod’s file system so that spartan can read certificates and username/password from the file system. spartan writes json format logs into files under /var/log, which is mounted to /var/log/andro/frontend/spartan. The log files are parsed by the Fluent Bit which forwards the logs to the Elastishsearch.
volumeMounts: - mountPath: /etc/kafka name: kafka-secret readOnly: true - mountPath: /etc/mongodb/secret name: mongodb-secret readOnly: true - mountPath: /etc/mongodb/cert name: mongodb-cert readOnly: true - mountPath: /var/log/ name: log-dir volumes: - name: kafka-secret secret: secretName: spartan-kafka-cert - name: mongodb-cert secret: secretName: spartn-mongodb-cert - name: mongodb-secret secret: secretName: spartan-mongodb-password - hostPath: path: /var/log/andro/frontend/spartan type: "" name: log-dir
After deploying the application, we can view the metrics and traces from the Grafana UI. Since service graph is enabled by default, as shown below in Figure 5, we can can view the generated service graph. The service graph shows that the spartan pod accesses the mongodb database: Global-spartan. There are other pods deployed in the platform, such as aegis shown in the graph.
This example demonstrates how open source tools can be utilized to swiftly implement platform engineering for cloud-native applications, whether deployed on-premises or in the cloud. Such an approach greatly assists developers in iterating on application development without the necessity of deploying in the cloud, thus saving costs while maintaining the flexibility to deploy the validated application in the cloud when ready.
We will integrate with GitOps method of deployment using ArgoCD.
We would like to express our gratitude to GopiKrishna Saripuri, Saravanan Masilamani, and Suresh Kannan for their valuable contributions to this project. Additionally, we extend our thanks to Kalyan Ghosh for providing guidance throughout the project.