In the previous post of this running series, we installed KubeClarity on EKS and examined how it works. Hopefully, it was insightful, and you are back with a dash of curiosity to learn more about the feature set and how it is architected. This architecture deep dive blog post will explore KubeClarity from an architectural perspective and go deeper into its implementation details.
A good starting point is to examine the architectural principles of KubeClarity. So, get ready to dive in!
The architectural overview is depicted in Figure-2 below.
An API-first architecture distributes functionality across multiple components and invokes each functional component via APIs. Any functionality available via CLI or UI is also available via APIs. A full API specification can be found here if you want to look.
KubeClarity functional components are glued based on the microservices architecture model. Each component is an independent entity. This standalone Go module can be pulled in as a library into your existing applications as a part or whole based on your preferences. Each module defines its own controllers to handle and process the API requests. Let’s drill down into these modules and understand how they fit the architectural scheme.
A frontend React APP exposes the following controls on the dashboard. It uses backend APIs to render this data:
KubeClarity provides CLI functionality through a standalone utility tool called kubeclarity-cli. This tool operates independently from the KubeClarity backend installation process and requires a separate installation to utilize its capabilities. In addition, the kubeclarity-cli tool offers flexibility by initiating scans at different stages within CI/CD pipelines, facilitating result merging from multiple stages, and allowing the uploading of results to the KubeClarity backend. Figure-3 illustrates the various CI stages supported by this tool.
Check out the Readme to learn more about how to run the CLI tool for scanning and exporting the results to the backend. Figure-3 below shows the operating model of the CLI.
The backend module is the main module. It carries out all the major feature orchestration of KubeClarity and exposes REST APIs to trigger the functions. These API calls are handled by dedicated controllers’ For example, a controller that handles CIS Docker Benchmark requests differs from the controller that handles check vulnerability requests. The backend controllers’ complete list and implementation details can be seen here. Figure-4 below captures the list of controllers as a quick reference.
This module is responsible for maintaining the scan states, starting, and stopping scans. It also has a reporting interface to report the results of the scan, including errors reported by failed scans. The scan orchestrator spawns scanning jobs based on the incoming request. These scanning jobs run content analysis (SBOM Software Bill of Materials analysis) and vulnerability scans.
The scan request can be specific to an application, image, or package and triggered via UI, CLI, or API. Alternatively, a scan request could be pertinent to a Kubernetes namespace or an entire cluster. Based on the scan request, the orchestrator kick-starts the jobs and initiates them with appropriate inputs. Go channels are used to implement the jobs. Orchestrators aggregate the results of these asynchronous jobs at the end.
The architecture diagram <above> shows an example of image scans of an application pod on the right side. The scan orchestrator starts a scanning job per image, depending on the number of images in the application pod. To carry out the scanning job, each scanner job needs access to SBOM DB to generate and store SBOMs and a centralized scanning server to look up known vulnerabilities. We will cover more details about it below.
Scanner Jobs (Content Analysis & Vulnerability Scans)
These scanner jobs run both the content analysis and vulnerability scanning tasks. A bulk of the logic of analyzers and scanners is implemented in the shared module. This module also includes miscellaneous utils to carry SBOM output format conversions, merging the outputs from multiple scanners and analyzers.
To complete a scanning job, the scanner job loads and processes the configuration parameters specific to the scanner. The implementation details can be found here if you want to check out further details on the scanner types, as shown in Figure-5 below:
SBOMDB is a Go module set up with its own controller, backend, and database components, it uses a SQLite database and gorm for its ORM layer. It supports APIs for storing and retrieving SBOMs with a resource hash. The API routes are handled by the controller instance defined in this module. SBOM DB is designed to act more like a cache, to avoid the overhead of recomputing SBOM, it stores the SBOM documents in a raw string format and avoids persistent storage overheads. We will cover more details about SBOM structure, integration and caching in our next blog. Figure-6 below shows the basic definition of SBOM object, you can check out further details here.
As a part of the scanning process, each of vulnerability scanning job needs to look up known vulnerabilities. Since the scanning jobs are independent and run in parallel, this means maintaining a dedicated copy of the known vulnerability database for each Job. But that would eat up a lot of cluster resources and space. KubeClarity supports configuring centralized servers with a single copy of the known vulnerabilities database to make this process more efficient. All the worker instances make an API call to this centralized instance to complete their task and report results. There is a further capability to choose a local vs. remote option for this centralized server. The config option can be set in values.yaml file.
Depending on your deployment and cluster resources, a local vs. remote server configuration can be chosen. You can check out the implementation details here to learn more. For your reference, Figure-7 below shows a capture of the type definition of the two modes of service, i.e., local vs. remote server configurations.
KubeClarity uses the PostgreSQL backend database, and supports materialized views of the database tables. Materialized views cache the result of a complex and expensive query and allow you to refresh this result periodically. The materialized views are useful in many cases that require fast data access; therefore, they are often used in data warehouses and business intelligence applications. Figure-8 below captures various tables implemented in this database to support the backend functionality.
There are major flows between the UI and the CLI. We will outline both flows here because they are slightly different in how the call flows work.
Figure 9 below shows UI-driven end-end API call flows to start a run-time scan and post the results to the user. Followed by the user navigating the vulnerability graph to drill down into a specific vulnerability.
Figure-10 below shows CLI call flows. The CLI is a standalone utility and runs the analysis and scanner jobs by directly loading the shared module as an internal library, runs the scans locally, and exposes a “-e” flag and “application id” flag to export the analyzer and scanner job results to the backend. The backend then builds a vulnerability dependency graph from this data. Check out Figure-10 below to observe the CLI call flows:
Hopefully, this got you down on the KubeClarity seabed and gave you a good look at the architecture. KubeClarity is now yours to enhance to suit your requirements, and please don’t forget to contribute your changes back upstream. Make changes to help others and contribute your voice to the project. It’s open source, so anyone can join in.
You might be interested in combining multiple SBOMs to generate a universal SBOM. Besides, that’s one of KubeClarity’s differentiating features, so we shouldn’t miss out on it. I’ll be bringing it up next!
Pallavi Kalapatapu is a Principal Engineer and open-source advocate in Cisco’s Emerging Technology & Incubation organization.