Cloud computing, artificial intelligence, and internet connected devices are the ineliminable technological pillars of contemporary digital society. However, a greater untapped potential, that can usher in the next generation of digital transformations and innovations, lies latent at the convergence of these technologies.
“The combined power of AI and IoT collectively referred to as the Artificial Intelligence of Things or AIoT, promises to unlock unrealized customer value in a broad swath of industry verticals such as edge analytics, autonomous vehicles, personalized fitness, remote healthcare, precision agriculture, smart retail, predictive maintenance, and industrial automation.”
In principle, combining AI with IoT seems to be the obvious logical progression in the evolution of these technologies. In practice though, building an AIoT solution is fraught with seemingly insurmountable architectural and engineering challenges. In this three-part series, I will discuss such challenges in sufficient detail and address them by proposing an overarching architectural framework. I hope this series will give you the adequate architectural context and perspective needed to build an industrial-grade scalable and robust AIoT application. Here is the series breakdown:
Part 1: AIoT Architecture – In this section, you will get a thorough grounding in the AIoT problem space, understand the inherent challenges and investigate emergent behaviors. I will present a set of effective solution patterns that can address such challenges, along with a comprehensive reference architecture. The reference architecture will serve as a cognitive map in the hitherto uncharted territory of AIoT architectures. It will assist you in pairing AIoT problem scenarios with applicable solution patterns and viable technology stacks.
Part 2: AIoT Infrastructure – Here using the reference architecture you will see how to establish an edge infrastructure for an AIoT application. The infrastructure is built using various CNCF open-source projects from the Kubernetes ecosystem such as K3S, Argo, Longhorn and Strimzi. You will see how to configure and install these projects on a cluster of AI acceleration equipped single-board computers such as NVIDIA® Jetson Nano™ and Google Coral Edge TPU™.
Part 3: AIoT Design – In the concluding part, you will see how to design and build an AIoT application that simulates an industrial predictive maintenance scenario. In this scenario, analog sensors monitor an induction motor by sensing its power utilization, vibration, sound, and temperature, and this data is then processed by an AIoT application. This application powered by a TPU accelerator applies a logistic regression model to predict and prevent motor breakdown. You will see how ML pipelines measure drift, re-train and re-deploy the model. Using various design artifacts such as event diagrams and deployment topology models you will get an in-depth view of the systems design. You will find ample code and configuration samples in C++, Go, Python and YAML. These samples will show you how to configure, code, build (ARM64 compatible), containerize (distroless), deploy and orchestrate AIoT modules and services as MLOps pipelines across various heterogeneous infrastructure tiers. This section also includes IoT device firmware code along with circuit schematics.
Building a “hello world” AIoT application is simple – train a model on the cloud, embed it in a device, simulate some sensor data, perform inferences, blink a few LEDs, and you are done. However, this simplicity is illusory, as engineering a “real world” AIoT solution is altogether a different ballgame, with an order of magnitude more complexity, and requiring deep technical know-how that spans multiple domains of electrical engineering and computer science. In designing a “real world” AIoT solution one encounters a myriad of challenges that necessitates a careful examination of various problem scenarios, emergent behaviors, conflicting requirements, and tradeoffs. Let’s discuss the architecturally significant ones in more detail.
AI and IoT based solutions often incorporate dissimilar design principles, industry standards, development methodologies, security controls, and software/firmware delivery pipelines. They run on heterogeneous computational platforms, operating systems, and network topologies. They exhibit a broad range of computing, storage, bandwidth, and energy utilization capabilities. This disparity in hardware and software of AI vs. IoT systems results in significant emergent operational complexity when combined in an AIoT solution.
Embedding a trained model and running inferences on an edge device is a relatively simple problem to solve. However, in the real world, post deployment, the model often drifts. This requires drift monitoring, re-training, and re-deployment. Data quality and timeliness are essential for drift detection necessitating continuous sensor data collection, processing, validation, and training. Updated ML models need to be re-deployed to the IoT devices using continuous delivery pipelines. Hence, the lifecycle of an AIoT application includes both ML and IoT related build, test, deploy toolchains and processes. Therefore one needs to account for the entire end-to-end operation of an AIoT solution encompassing software development, delivery, security, and monitoring.
The computational complexity, both space and time, of learning algorithms significantly differs from inferences. To illustrate this point, let’s look at the logistic regression algorithm complexity in this table
Notice the training time complexity of the logistic regression using newton-raphson optimization vs. the inference time. The training complexity is polynomial while the inference is linear. As an example, a resource constrained device does not have the computational power to train a logistic regression model but can easily handle a logistic regression inference. Conversely, an AI accelerated device (say with an onboard GPU accelerator) might be overkill, both from a cost and computational power perspective if used just for inferencing. This is an important consideration that needs to be accounted for architecturally.
The computational complexity of ML tasks quickly overwhelms resource-constrained devices that have limited energy, memory, compute, and storage capacity. Most ML frameworks are too onerous for embedded devices. The standard hardware-agnostic metrics used to measure performance such as FLOPS and MACs multiplier–accumulate (MAC), lack the fidelity to measure real performance for a particular edge ML device. Optimization strategies targeted for such hardware introduce errors that erode the model efficacy. Compute intensive inferences can starve IoT devices and interfere with real-time sensing and actuation subroutines.
Deriving any actionable and meaningful insight from the data collected by the AIoT devices requires processing analyzing the sensor data on the edge tier. However such data often has to stay on the device for privacy and security reasons. Edge devices lack the physical security guarantee of a data center. A single compromised edge node can significantly widen the scope of a security breach. Low energy and low bandwidth IoT protocols are particularly prone to such attacks. Thus the application of appropriate security controls is essential to ensure data security and privacy. However, this creates a particularly intractable set of requirements as computation intensive security controls compete for power, resources, and bandwidth on devices that are inherently resource constrained.
Autonomous vehicles, robotics, and industrial automation often require instant action, low latency “sense, decide and act” real-time loops. Even with the ML logic embedded on the device, the context needed to make a decision requires an IoT device to frequently communicate with the edge tier. This makes enabling closed-loop AI enabled decisions, particularly challenging in real-world scenarios.
In order to address such challenges in their entirety, one needs to take a holistic view of the entire problem space and uncover a set of recurring problems that span both the AI and IoT domains. My approach to expressing the solution is extensively based on the language of patterns. Various architectural and design patterns can be quite effective in managing the complexity of running the entire AIoT solution on the edge tier. Embedded ML patterns can also help in addressing the device resource constraint challenges. Minimizing or eliminating the dependency on the cloud tier can be achieved by running the entire ML pipeline on the edge tier, closer to the sensors. This can vastly improve the network latency and address security concerns.
Manage complexity by creating a clear separation of concerns using a tiered architecture. Partition the infrastructure into tiers to separate training from inferences and data acquisition activities. This allows for independent scaling, energy management, and securing of each tier. As you will see in the subsequent sections, separating the inference from learning activities and running them on separate tiers allows for the training jobs to run on AI accelerated hardware such as GPUs or TPUs, while inference jobs can run on resource constrained hardware. This separation also minimizes the power demands on battery powered hardware as the energy intensive training jobs can now run on a dedicated tier with wired AC/DC powered devices.
Process high volume and high velocity IoT data in real-time with minimal latency and maximum concurrency using messages and event streams. Allow continuous flow, interpretation, and processing of events, while minimizing temporal coupling between sensor data consumers and producers. This pattern facilitates a loosely coupled structure and organization of such services on heterogeneous computational platforms. It also enables each service to scale and fail independently thus creating clear isolation boundaries.
Establish a durable and reliable event streaming mechanism for communication between the services involved in training, inferencing, and orchestrations. Various command and data messages can persist as streams and get ordered (within a partition). Consumers can process the streams as they occur or retrospectively. Consumers can join the stream anytime, replay, ignore or process past messages asynchronously.
Establish lightweight and bandwidth efficient pub/sub based messaging to communicate with the IoT devices. Such messages cannot be replayed or retransmitted once received. A new subscriber will not be able to receive any past messages and the message order is not guaranteed.
Bridge the two event-driven patterns by converting the pub/sub messages into event streams and vice versa.
Using the sidecar pattern to isolate and decouple embedded inference from communication with event streams. This keeps the inference modules lean and portable with minimal dependencies, ideal for constrained device deployments.
Various techniques to adapt the model architecture and reduce its complexity and size can be quite effective in minimizing resource utilization. Here are a few examples
Compressing the model can significantly reduce the inference time and consequently minimize resource consumption. In the reference implementation, I will be using quantization to compress the model.
Binarizing weights and activations to only two values (1, -1) can improve performance and reduce energy utilization. However, the use of this strategy needs to be carefully weighed against the loss of accuracy.
Using digital signal processing, close to the point of data acquisition, can significantly improve signal-to-noise ratio and eliminate inconsequential data. In industrial IoT scenarios, training the model on the raw sensor data tends to train the model on the noise rather than the signal. Transforms such as Fourier, Hilbert, Wavelet, etc. can vastly improve both training and inference efficiency.
Perform close-loop, low latency inferencing for anomaly detection and intervention at the edge closer to the point of data acquisition. Use context specific inferencing for predictive analytics at an aggregate level. In the reference implementation, they are referred to as “Level 1” and “Level 2” inferencing respectively.
Package ML tasks such as ingest, extract, drift detection, train, etc., and related dependencies as containerized workloads. Use container orchestration to manage the workload deployments. Use container workflow pipelines to automate continuous training, evaluation, and delivery.
Use AI accelerator aware workload placement strategies to ensure workloads that require AI acceleration are placed on appropriate computational hardware.
Bring the entire learning pipeline to the edge tier, eliminating the dependency on the cloud tier. Run and manage ML tasks such as extract, drift detection, training, validation, and model compression on the edge tier.
Express the desired state and flow of the ML tasks and their dependencies as directed acyclic graphs (DAG). Use a container workflow engine to achieve the desired state and flow.
Use declarative automation to deploy, manage and monitor containerized workloads across various edge infrastructure tiers.
Formalizing AIoT patterns in a reference architecture is an effective strategy to decompose the problem space, identify recurring scenarios and apply repeatable best practices and patterns to resolve them.
Using the aforementioned patterns, this reference architecture attempts to manage the complexity arising in developing, deploying, and monitoring an AIoT solution, on a plethora of heterogeneous computational hardware and network topologies. It achieves this by proposing a distributed event-driven architecture that is hosted on a multi-tier infrastructure.
The multi-tiered architecture creates clear and distinct boundaries for network, security, scalability, durability, and reliability guarantees of the solution. Each tier can be independently secured and scaled based on the nature of the tier’s workload, data privacy, and computational hardware characteristics.
The three infrastructure tiers host various components and services, have specific roles, and establish a clear separation of the following concerns:
Let’s examine the characteristics of each tier in more detail and understand how a tiered event-driven architecture addresses these concerns.
The Things Tier hosts the Perception components. The sensors and actuators in this tier serve as the primary interface to the physical world. Components in this tier sense the physical environment, digitize the signal, process and transmit it to the rest of the tiers. The Things Tier is comprised of constrained edge devices and is architected to meet the following requirements and operational constraints:
The inference tier hosts the Cognition services that analyze data coming from the Things Tier and generate real-time actionable insights and alerts. This tier is architected to meet the following requirements and operational constraints:
The platform tier hosts two categories of services – MLOps and Platform Services. It logically partitions training-related activities from platform services, enabling computationally intensive training jobs to run on dedicated AI accelerated devices. This tier is architected to meet the following requirements and operational constraints:
In this article, we explored the AIoT problem landscape, the emergent behaviors, and architecturally significant use cases. We saw how using a tiered event driven architecture and employing AIoT patterns in a reference architecture, we can achieve a clean separation of concerns, address emergent behaviors and manage the ensuing complexity.
In part 2 of this series, we will see how to build a concrete infrastructure implementation of this reference architecture that is capable of hosting a real-world AIoT application.