Outshift | KubeClarity: Architecture Deep Dive

Lean Into Software Supply Chain Security with KubeClarity Series — https://github.com/openclarity/kubeclarity

In the previous post of this running series, we installed KubeClarity on EKS and examined how it works. Hopefully, it was insightful, and you are back with a dash of curiosity to learn more about the feature set and how it is architected. This architecture deep dive blog post will explore KubeClarity from an architectural perspective and go deeper into its implementation details.

***Figure-1: Deep Dive into the KubeClarity Architecture***

A good starting point is to examine the architectural principles of KubeClarity. So, get ready to dive in!

KubeClarity Architecture Principles

Plug and play Architecture & Infrastructure
Overloadable content analyzers & scanners
Modularized functionality
Parallelize image scanning workloads.
Capabilities to merge & unify results from parallel runs.
Centralized Servers for resource efficiency
Configurable namespaces and scanning targets
User-friendly Interface and Automation ready CLI
Common APIs

Architecture Overview

The architectural overview is depicted in Figure-2 below.

***Figure-2: KubeClarity Architecture Block Diagram***

REST API & Microservices-Based Architecture

An API-first architecture distributes functionality across multiple components and invokes each functional component via APIs. Any functionality available via CLI or UI is also available via APIs. A full API specification can be found here if you want to look.

Modular Design

KubeClarity functional components are glued based on the microservices architecture model. Each component is an independent entity. This standalone Go module can be pulled in as a library into your existing applications as a part or whole based on your preferences. Each module defines its own controllers to handle and process the API requests. Let’s drill down into these modules and understand how they fit the architectural scheme.

UI/Dashboard

A frontend React APP exposes the following controls on the dashboard. It uses backend APIs to render this data:

Fixable vulnerabilities per severity
Top 5 vulnerable elements (applications, resources, packages)
New vulnerabilities trends
Package count per license type
Package count per programming language
General counters

CLI

KubeClarity provides CLI functionality through a standalone utility tool called kubeclarity-cli. This tool operates independently from the KubeClarity backend installation process and requires a separate installation to utilize its capabilities. In addition, the kubeclarity-cli tool offers flexibility by initiating scans at different stages within CI/CD pipelines, facilitating result merging from multiple stages, and allowing the uploading of results to the KubeClarity backend. Figure-3 illustrates the various CI stages supported by this tool.

Build app
Build image
Push image

Check out the Readme to learn more about how to run the CLI tool for scanning and exporting the results to the backend. Figure-3 below shows the operating model of the CLI.

Flexible KubeClarity CLI Architecture — ***Figure-3: Flexible KubeClarity CLI Scans at Various CI/CD Phases***

Backend

The backend module is the main module. It carries out all the major feature orchestration of KubeClarity and exposes REST APIs to trigger the functions. These API calls are handled by dedicated controllers’ For example, a controller that handles CIS Docker Benchmark requests differs from the controller that handles check vulnerability requests. The backend controllers' complete list and implementation details can be seen here. Figure-4 below captures the list of controllers as a quick reference.

Runtime Scan Orchestrator

This module is responsible for maintaining the scan states, starting, and stopping scans. It also has a reporting interface to report the results of the scan, including errors reported by failed scans. The scan orchestrator spawns scanning jobs based on the incoming request. These scanning jobs run content analysis (SBOM Software Bill of Materials analysis) and vulnerability scans.

The scan request can be specific to an application, image, or package and triggered via UI, CLI, or API. Alternatively, a scan request could be pertinent to a Kubernetes namespace or an entire cluster. Based on the scan request, the orchestrator kick-starts the jobs and initiates them with appropriate inputs. Go channels are used to implement the jobs. Orchestrators aggregate the results of these asynchronous jobs at the end.

The architecture diagram <above> shows an example of image scans of an application pod on the right side. The scan orchestrator starts a scanning job per image, depending on the number of images in the application pod. To carry out the scanning job, each scanner job needs access to SBOM DB to generate and store SBOMs and a centralized scanning server to look up known vulnerabilities. We will cover more details about it below.

Scanner Jobs (Content Analysis & Vulnerability Scans)

These scanner jobs run both the content analysis and vulnerability scanning tasks. A bulk of the logic of analyzers and scanners is implemented in the shared module . This module also includes miscellaneous utils to carry SBOM output format conversions, merging the outputs from multiple scanners and analyzers.

To complete a scanning job, the scanner job loads and processes the configuration parameters specific to the scanner. The implementation details can be found here if you want to check out further details on the scanner types, as shown in Figure-5 below:

Content Analysis & Vulnerability Scans — ***Figure-5: KubeClarity Vulnerability Scanner Types***

SBOM DB

SBOMDB is a Go module set up with its own controller, backend, and database components, it uses a SQLite database and gorm for its ORM layer. It supports APIs for storing and retrieving SBOMs with a resource hash. The API routes are handled by the controller instance defined in this module. SBOM DB is designed to act more like a cache, to avoid the overhead of recomputing SBOM, it stores the SBOM documents in a raw string format and avoids persistent storage overheads. We will cover more details about SBOM structure, integration and caching in our next blog. Figure-6 below shows the basic definition of SBOM object, you can check out further details here .

Centralized Scanning Server

As a part of the scanning process, each of vulnerability scanning job needs to look up known vulnerabilities. Since the scanning jobs are independent and run in parallel, this means maintaining a dedicated copy of the known vulnerability database for each Job. But that would eat up a lot of cluster resources and space. KubeClarity supports configuring centralized servers with a single copy of the known vulnerabilities database to make this process more efficient. All the worker instances make an API call to this centralized instance to complete their task and report results. There is a further capability to choose a local vs. remote option for this centralized server. The config option can be set in values.yaml file.

Depending on your deployment and cluster resources, a local vs. remote server configuration can be chosen. You can check out the implementation details here to learn more. For your reference, Figure-7 below shows a capture of the type definition of the two modes of service, i.e., local vs. remote server configurations.

Type Definitions for Local and Remote Grype Server Configurations in Architecture — ***Figure-7 Type Definitions for Local and Remote Grype Server Configurations***

PostgreSQL

KubeClarity uses the PostgreSQL backend database, and supports materialized views of the database tables. Materialized views cache the result of a complex and expensive query and allow you to refresh this result periodically. The materialized views are useful in many cases that require fast data access; therefore, they are often used in data warehouses and business intelligence applications. Figure-8 below captures various tables implemented in this database to support the backend functionality.

KubeClarity Backend DB Table Definitions to support the Architecture — ***Figure-8 KubeClarity Backend DB Table Definitions***

API Call Flows

There are major flows between the UI and the CLI. We will outline both flows here because they are slightly different in how the call flows work.

UI Call Flow

Figure 9 below shows UI-driven end-end API call flows to start a run-time scan and post the results to the user. Followed by the user navigating the vulnerability graph to drill down into a specific vulnerability.

***Figure-9:*** **The KubeClarity UI Flow for Triggering a Run-Time Scan and Navigating Vulnerabilities**

CLI Call Flow

Figure-10 below shows CLI call flows. The CLI is a standalone utility and runs the analysis and scanner jobs by directly loading the shared module as an internal library, runs the scans locally, and exposes a “-e” flag and “application id” flag to export the analyzer and scanner job results to the backend. The backend then builds a vulnerability dependency graph from this data. Check out Figure-10 below to observe the CLI call flows:

***Figure-10: The KubeClarity CLI Flow for Running Analysis and Exporting Results to Backend***

Conclusion

Hopefully, this got you down on the KubeClarity seabed and gave you a good look at the architecture. KubeClarity is now yours to enhance to suit your requirements, and please don't forget to contribute your changes back upstream. Make changes to help others and contribute your voice to the project. It's open source, so anyone can join in.

Next Up

You might be interested in combining multiple SBOMs to generate a universal SBOM. Besides, that's one of KubeClarity's differentiating features, so we shouldn't miss out on it. I'll be bringing it up next!

Pallavi Kalapatapu is a Principal Engineer and open-source advocate in Cisco’s Emerging Technology & Incubation organization.

Insights

Inside Outshift

Collaborations

Product

Categories

Search Blog

by Pallavi Kalapatapu

Published on 05/23/2023

Last updated on 03/21/2024

Published on 05/23/2023

Last updated on 03/21/2024

KubeClarity: Architecture Deep Dive

Get emerging insights on emerging technology straight to your inbox.

KubeClarity Architecture Principles

Architecture Overview

REST API & Microservices-Based Architecture

Modular Design

UI/Dashboard

CLI

Backend

Runtime Scan Orchestrator

SBOM DB

Centralized Scanning Server

PostgreSQL

API Call Flows

UI Call Flow

CLI Call Flow

Conclusion

Next Up

Unlocking Multi-Cloud Security: Panoptica's Graph-Based Approach

Related articles

Insights

Platform engineering for cloud-native applications developments

Insights

KubeClarity: Install and test drive this cloud security scanning solution

Insights

Streamlining log management with Fluvio and Logging Operator