Outshift Logo

PRODUCT

9 min read

Blog thumbnail
Published on 05/23/2023
Last updated on 03/21/2024

KubeClarity: Architecture Deep Dive

Share

Lean Into Software Supply Chain Security with KubeClarity Series
https://github.com/openclarity/kubeclarity

In the previous post of this running series, we installed KubeClarity on EKS and examined how it works. Hopefully, it was insightful, and you are back with a dash of curiosity to learn more about the feature set and how it is architected. This architecture deep dive blog post will explore KubeClarity from an architectural perspective and go deeper into its implementation details.


Deep Dive into the KubeClarity Architecture
 
Figure-1: Deep Dive into the KubeClarity Architecture

A good starting point is to examine the architectural principles of KubeClarity. So, get ready to dive in!

KubeClarity Architecture Principles

  • Plug and play Architecture & Infrastructure
  • Overloadable content analyzers & scanners
  • Modularized functionality
  • Parallelize image scanning workloads.
  • Capabilities to merge & unify results from parallel runs.
  • Centralized Servers for resource efficiency
  • Configurable namespaces and scanning targets
  • User-friendly Interface and Automation ready CLI
  • Common APIs

Architecture Overview

The architectural overview is depicted in Figure-2 below.

KubeClarity Architecture Block Diagram
Figure-2: KubeClarity Architecture Block Diagram

REST API & Microservices-Based Architecture

An API-first architecture distributes functionality across multiple components and invokes each functional component via APIs. Any functionality available via CLI or UI is also available via APIs.  A full API specification can be found here if you want to look.

Modular Design

KubeClarity functional components are glued based on the microservices architecture model. Each component is an independent entity. This standalone Go module can be pulled in as a library into your existing applications as a part or whole based on your preferences. Each module defines its own controllers to handle and process the API requests. Let’s drill down into these modules and understand how they fit the architectural scheme.

UI/Dashboard

A frontend React APP exposes the following controls on the dashboard. It uses backend APIs to render this data:

  • Fixable vulnerabilities per severity
  • Top 5 vulnerable elements (applications, resources, packages)
  • New vulnerabilities trends
  • Package count per license type
  • Package count per programming language
  • General counters

CLI

KubeClarity provides CLI functionality through a standalone utility tool called kubeclarity-cli. This tool operates independently from the KubeClarity backend installation process and requires a separate installation to utilize its capabilities. In addition, the kubeclarity-cli tool offers flexibility by initiating scans at different stages within CI/CD pipelines, facilitating result merging from multiple stages, and allowing the uploading of results to the KubeClarity backend. Figure-3 illustrates the various CI stages supported by this tool.

  • Build app
  • Build image
  • Push image

Check out the Readme to learn more about how to run the CLI tool for scanning and exporting the results to the backend. Figure-3 below shows the operating model of the CLI.

Flexible KubeClarity CLI Architecture
             Figure-3: Flexible KubeClarity CLI Scans at Various CI/CD Phases

Backend

The backend module is the main module. It carries out all the major feature orchestration of KubeClarity and exposes REST APIs to trigger the functions. These API calls are handled by dedicated controllers’ For example, a controller that handles CIS Docker Benchmark requests differs from the controller that handles check vulnerability requests. The backend controllers' complete list and implementation details can be seen here. Figure-4 below captures the list of controllers as a quick reference.

KubeClarity Architecture Backend API Controllers
Figure-4: KubeClarity Backend API Controllers

Runtime Scan Orchestrator

This module is responsible for maintaining the scan states, starting, and stopping scans. It also has a reporting interface to report the results of the scan, including errors reported by failed scans. The scan orchestrator spawns scanning jobs based on the incoming request. These scanning jobs run content analysis (SBOM Software Bill of Materials analysis) and vulnerability scans.

The scan request can be specific to an application, image, or package and triggered via UI, CLI, or API. Alternatively, a scan request could be pertinent to a Kubernetes namespace or an entire cluster. Based on the scan request, the orchestrator kick-starts the jobs and initiates them with appropriate inputs. Go channels are used to implement the jobs. Orchestrators aggregate the results of these asynchronous jobs at the end.

The architecture diagram <above> shows an example of image scans of an application pod on the right side. The scan orchestrator starts a scanning job per image, depending on the number of images in the application pod. To carry out the scanning job, each scanner job needs access to SBOM DB to generate and store SBOMs and a centralized scanning server to look up known vulnerabilities. We will cover more details about it below.

Scanner Jobs (Content Analysis & Vulnerability Scans)

These scanner jobs run both the content analysis and vulnerability scanning tasks. A bulk of the logic of analyzers and scanners is implemented in the shared module. This module also includes miscellaneous utils to carry SBOM output format conversions, merging the outputs from multiple scanners and analyzers.

To complete a scanning job, the scanner job loads and processes the configuration parameters specific to the scanner. The implementation details can be found here if you want to check out further details on the scanner types, as shown in Figure-5 below:

Content Analysis & Vulnerability Scans
    Figure-5: KubeClarity Vulnerability Scanner Types

SBOM DB

SBOMDB is a Go module set up with its own controller, backend, and database components, it uses a SQLite database and gorm for its ORM layer. It supports APIs for storing and retrieving SBOMs with a resource hash. The API routes are handled by the controller instance defined in this module. SBOM DB is designed to act more like a cache, to avoid the overhead of recomputing SBOM, it stores the SBOM documents in a raw string format and avoids persistent storage overheads. We will cover more details about SBOM structure, integration and caching in our next blog. Figure-6 below shows the basic definition of SBOM object, you can check out further details here.

KubeClarity SBOM DB Type Definition
Figure-6: KubeClarity SBOM DB Type Definition

Centralized Scanning Server

As a part of the scanning process, each of vulnerability scanning job needs to look up known vulnerabilities. Since the scanning jobs are independent and run in parallel, this means maintaining a dedicated copy of the known vulnerability database for each  Job. But that would eat up a lot of cluster resources and space. KubeClarity supports configuring centralized servers with a single copy of the known vulnerabilities database to make this process more efficient. All the worker instances make an API call to this centralized instance to complete their task and report results. There is a further capability to choose a local vs. remote option for this centralized server. The config option can be set in values.yaml file.

Depending on your deployment and cluster resources, a local vs. remote server configuration can be chosen. You can check out the implementation details here to learn more. For your reference, Figure-7 below shows a capture of the type definition of the two modes of service, i.e., local vs. remote server configurations.

Type Definitions for Local and Remote Grype Server Configurations in Architecture
Figure-7  Type Definitions for Local and Remote Grype Server Configurations

PostgreSQL

 KubeClarity uses the PostgreSQL backend database, and supports materialized views of the database tables. Materialized views cache the result of a complex and expensive query and allow you to refresh this result periodically. The materialized views are useful in many cases that require fast data access; therefore, they are often used in data warehouses and business intelligence applications. Figure-8 below captures various tables implemented in this database to support the backend functionality.

KubeClarity Backend DB Table Definitions to support the Architecture
             Figure-8  KubeClarity Backend DB Table Definitions

API Call Flows

There are major flows between the UI and the CLI. We will outline both flows here because they are slightly different in how the call flows work.

UI Call Flow

Figure 9 below shows UI-driven end-end API call flows to start a run-time scan and post the results to the user. Followed by the user navigating the vulnerability graph to drill down into a specific vulnerability.

The KubeClarity UI Flow for Triggering a Run-Time Scan and Navigating Vulnerabilities
Figure-9:  The KubeClarity UI Flow for Triggering a Run-Time Scan and Navigating Vulnerabilities

CLI Call Flow

Figure-10 below shows CLI call flows. The CLI is a standalone utility and runs the analysis and scanner jobs by directly loading the shared module as an internal library, runs the scans locally, and exposes a “-e” flag and “application id” flag to export the analyzer and scanner job results to the backend. The backend then builds a vulnerability dependency graph from this data. Check out Figure-10 below to observe the CLI call flows:

The KubeClarity CLI  Flow for Running Analysis and Exporting Results to Backend
Figure-10:  The KubeClarity CLI  Flow for Running Analysis and Exporting Results to Backend

Conclusion

Hopefully, this got you down on the KubeClarity seabed and gave you a good look at the architecture. KubeClarity is now yours to enhance to suit your requirements, and please don't forget to contribute your changes back upstream. Make changes to help others and contribute your voice to the project. It's open source, so anyone can join in.

Next Up

You might be interested in combining multiple SBOMs to generate a universal SBOM. Besides, that's one of KubeClarity's differentiating features, so we shouldn't miss out on it. I'll be bringing it up next!



Pallavi Kalapatapu is a Principal Engineer and open-source advocate in Cisco’s Emerging Technology & Incubation organization.

Subscribe card background
Subscribe
Subscribe to
the Shift!

Get emerging insights on emerging technology straight to your inbox.

Unlocking Multi-Cloud Security: Panoptica's Graph-Based Approach

Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.

thumbnail
I
Subscribe
Subscribe
 to
the Shift
!
Get
emerging insights
on emerging technology straight to your inbox.

The Shift keeps you at the forefront of cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations that are shaping the future of technology.

Outshift Background