The Architect's Guide to the AIoT - Part 1

Asheesh Goja
Asheesh Goja

Tuesday, March 22nd, 2022

"All you really need to know for the moment is that the AIoT is a lot more complicated than you might think, even if you start from a position of thinking it’s pretty damn complicated in the first place." - Inspired by HG2G


Cloud computing, artificial intelligence, and internet connected devices are the ineliminable technological pillars of contemporary digital society. However, a greater untapped potential, that can usher in the next generation of digital transformations and innovations, lies latent at the convergence of these technologies.

"The combined power of AI and IoT collectively referred to as the Artificial Intelligence of Things or AIoT, promises to unlock unrealized customer value in a broad swath of industry verticals such as edge analytics, autonomous vehicles, personalized fitness, remote healthcare, precision agriculture, smart retail, predictive maintenance, and industrial automation."

In principle, combining AI with IoT seems to be the obvious logical progression in the evolution of these technologies. In practice though, building an AIoT solution is fraught with seemingly insurmountable architectural and engineering challenges. In this three-part series, I will discuss such challenges in sufficient detail and address them by proposing an overarching architectural framework. I hope this series will give you the adequate architectural context and perspective needed to build an industrial-grade scalable and robust AIoT application. Here is the series breakdown:

Part 1: AIoT Architecture - In this section, you will get a thorough grounding in the AIoT problem space, understand the inherent challenges and investigate emergent behaviors. I will present a set of effective solution patterns that can address such challenges, along with a comprehensive reference architecture. The reference architecture will serve as a cognitive map in the hitherto uncharted territory of AIoT architectures. It will assist you in pairing AIoT problem scenarios with applicable solution patterns and viable technology stacks.

Part 2: AIoT Infrastructure - Here using the reference architecture you will see how to establish an edge infrastructure for an AIoT application. The infrastructure is built using various CNCF open-source projects from the Kubernetes ecosystem such as K3S, Argo, Longhorn and Strimzi. You will see how to configure and install these projects on a cluster of AI acceleration equipped single-board computers such as NVIDIA® Jetson Nano™ and Google Coral Edge TPU™.

Part 3: AIoT Design - In the concluding part, you will see how to design and build an AIoT application that simulates an industrial predictive maintenance scenario. In this scenario, analog sensors monitor an induction motor by sensing its power utilization, vibration, sound, and temperature, and this data is then processed by an AIoT application. This application powered by a TPU accelerator applies a logistic regression model to predict and prevent motor breakdown. You will see how ML pipelines measure drift, re-train and re-deploy the model. Using various design artifacts such as event diagrams and deployment topology models you will get an in-depth view of the systems design. You will find ample code and configuration samples in C++, Go, Python and YAML. These samples will show you how to configure, code, build (ARM64 compatible), containerize (distroless), deploy and orchestrate AIoT modules and services as MLOps pipelines across various heterogeneous infrastructure tiers. This section also includes IoT device firmware code along with circuit schematics.

The Problem - "Illusion of simplicity"

Building a “hello world” AIoT application is simple - train a model on the cloud, embed it in a device, simulate some sensor data, perform inferences, blink a few LEDs, and you are done. However, this simplicity is illusory, as engineering a “real world” AIoT solution is altogether a different ballgame, with an order of magnitude more complexity, and requiring deep technical know-how that spans multiple domains of electrical engineering and computer science. In designing a "real world" AIoT solution one encounters a myriad of challenges that necessitates a careful examination of various problem scenarios, emergent behaviors, conflicting requirements, and tradeoffs. Let's discuss the architecturally significant ones in more detail.

Emergent Operational Complexity

AI and IoT based solutions often incorporate dissimilar design principles, industry standards, development methodologies, security controls, and software/firmware delivery pipelines. They run on heterogeneous computational platforms, operating systems, and network topologies. They exhibit a broad range of computing, storage, bandwidth, and energy utilization capabilities. This disparity in hardware and software of AI vs. IoT systems results in significant emergent operational complexity when combined in an AIoT solution.

Embedding a trained model and running inferences on an edge device is a relatively simple problem to solve. However, in the real world, post deployment, the model often drifts. This requires drift monitoring, re-training, and re-deployment. Data quality and timeliness are essential for drift detection necessitating continuous sensor data collection, processing, validation, and training. Updated ML models need to be re-deployed to the IoT devices using continuous delivery pipelines. Hence, the lifecycle of an AIoT application includes both ML and IoT related build, test, deploy toolchains and processes. Therefore one needs to account for the entire end-to-end operation of an AIoT solution encompassing software development, delivery, security, and monitoring.

Computational Complexity

The computational complexity, both space and time, of learning algorithms significantly differs from inferences. To illustrate this point, let’s look at the logistic regression algorithm complexity in this table

Notice the training time complexity of the logistic regression using newton-raphson optimization vs. the inference time. The training complexity is polynomial while the inference is linear. As an example, a resource constrained device does not have the computational power to train a logistic regression model but can easily handle a logistic regression inference. Conversely, an AI accelerated device (say with an onboard GPU accelerator) might be overkill, both from a cost and computational power perspective if used just for inferencing. This is an important consideration that needs to be accounted for architecturally.

Resource Constraints

The computational complexity of ML tasks quickly overwhelms resource-constrained devices that have limited energy, memory, compute, and storage capacity. Most ML frameworks are too onerous for embedded devices. The standard hardware-agnostic metrics used to measure performance such as FLOPS and MACs multiplier–accumulate (MAC), lack the fidelity to measure real performance for a particular edge ML device. Optimization strategies targeted for such hardware introduce errors that erode the model efficacy. Compute intensive inferences can starve IoT devices and interfere with real-time sensing and actuation subroutines.

Security and Privacy

Deriving any actionable and meaningful insight from the data collected by the AIoT devices requires processing analyzing the sensor data on the edge tier. However such data often has to stay on the device for privacy and security reasons. Edge devices lack the physical security guarantee of a data center. A single compromised edge node can significantly widen the scope of a security breach. Low energy and low bandwidth IoT protocols are particularly prone to such attacks. Thus the application of appropriate security controls is essential to ensure data security and privacy. However, this creates a particularly intractable set of requirements as computation intensive security controls compete for power, resources, and bandwidth on devices that are inherently resource constrained.

Latency Constraints

Autonomous vehicles, robotics, and industrial automation often require instant action, low latency “sense, decide and act” real-time loops. Even with the ML logic embedded on the device, the context needed to make a decision requires an IoT device to frequently communicate with the edge tier. This makes enabling closed-loop AI enabled decisions, particularly challenging in real-world scenarios.

The Solution - “AIoT Patterns”

In order to address such challenges in their entirety, one needs to take a holistic view of the entire problem space and uncover a set of recurring problems that span both the AI and IoT domains. My approach to expressing the solution is extensively based on the language of patterns. Various architectural and design patterns can be quite effective in managing the complexity of running the entire AIoT solution on the edge tier. Embedded ML patterns can also help in addressing the device resource constraint challenges. Minimizing or eliminating the dependency on the cloud tier can be achieved by running the entire ML pipeline on the edge tier, closer to the sensors. This can vastly improve the network latency and address security concerns.

Application Architecture Patterns

Tiered Infrastructure

Manage complexity by creating a clear separation of concerns using a tiered architecture. Partition the infrastructure into tiers to separate training from inferences and data acquisition activities. This allows for independent scaling, energy management, and securing of each tier. As you will see in the subsequent sections, separating the inference from learning activities and running them on separate tiers allows for the training jobs to run on AI accelerated hardware such as GPUs or TPUs, while inference jobs can run on resource constrained hardware. This separation also minimizes the power demands on battery powered hardware as the energy intensive training jobs can now run on a dedicated tier with wired AC/DC powered devices.

Event-driven architecture

Process high volume and high velocity IoT data in real-time with minimal latency and maximum concurrency using messages and event streams. Allow continuous flow, interpretation, and processing of events, while minimizing temporal coupling between sensor data consumers and producers. This pattern facilitates a loosely coupled structure and organization of such services on heterogeneous computational platforms. It also enables each service to scale and fail independently thus creating clear isolation boundaries.

Event Streaming for ML

Establish a durable and reliable event streaming mechanism for communication between the services involved in training, inferencing, and orchestrations. Various command and data messages can persist as streams and get ordered (within a partition). Consumers can process the streams as they occur or retrospectively. Consumers can join the stream anytime, replay, ignore or process past messages asynchronously.

Publish and Subscribe for IoT

Establish lightweight and bandwidth efficient pub/sub based messaging to communicate with the IoT devices. Such messages cannot be replayed or retransmitted once received. A new subscriber will not be able to receive any past messages and the message order is not guaranteed.

Protocol Bridge

Bridge the two event-driven patterns by converting the pub/sub messages into event streams and vice versa.

Streaming API sidecar

Using the sidecar pattern to isolate and decouple embedded inference from communication with event streams. This keeps the inference modules lean and portable with minimal dependencies, ideal for constrained device deployments.

Embedded ML Patterns

ML techniques for constrained devices

Various techniques to adapt the model architecture and reduce its complexity and size can be quite effective in minimizing resource utilization. Here are a few examples

  • Model partitioning
  • Caching
  • Early stopping/termination
  • Data compression/sparsification.
  • Patch based Inferencing such as MCUNetV2

Model Compression

Compressing the model can significantly reduce the inference time and consequently minimize resource consumption. In the reference implementation, I will be using quantization to compress the model.

Binarized Neural Networks

Binarizing weights and activations to only two values (1, -1) can improve performance and reduce energy utilization. However, the use of this strategy needs to be carefully weighed against the loss of accuracy.


Using digital signal processing, close to the point of data acquisition, can significantly improve signal-to-noise ratio and eliminate inconsequential data. In industrial IoT scenarios, training the model on the raw sensor data tends to train the model on the noise rather than the signal. Transforms such as Fourier, Hilbert, Wavelet, etc. can vastly improve both training and inference efficiency.

Multi-stage inference

Perform close-loop, low latency inferencing for anomaly detection and intervention at the edge closer to the point of data acquisition. Use context specific inferencing for predictive analytics at an aggregate level. In the reference implementation, they are referred to as "Level 1" and "Level 2" inferencing respectively.

MLOps Patterns

Reproducibility Pattern - Containerize workloads, Pipeline execution

Package ML tasks such as ingest, extract, drift detection, train, etc., and related dependencies as containerized workloads. Use container orchestration to manage the workload deployments. Use container workflow pipelines to automate continuous training, evaluation, and delivery.

AI Accelerator aware orchestration strategy

Use AI accelerator aware workload placement strategies to ensure workloads that require AI acceleration are placed on appropriate computational hardware.

Edge Learning

Bring the entire learning pipeline to the edge tier, eliminating the dependency on the cloud tier. Run and manage ML tasks such as extract, drift detection, training, validation, and model compression on the edge tier.

Directed Acyclic Graphs

Express the desired state and flow of the ML tasks and their dependencies as directed acyclic graphs (DAG). Use a container workflow engine to achieve the desired state and flow.

Automated container orchestration

Use declarative automation to deploy, manage and monitor containerized workloads across various edge infrastructure tiers.

Formalizing AIoT patterns in a reference architecture is an effective strategy to decompose the problem space, identify recurring scenarios and apply repeatable best practices and patterns to resolve them.

The Reference Architecture

Using the aforementioned patterns, this reference architecture attempts to manage the complexity arising in developing, deploying, and monitoring an AIoT solution, on a plethora of heterogeneous computational hardware and network topologies. It achieves this by proposing a distributed event-driven architecture that is hosted on a multi-tier infrastructure.

The multi-tiered architecture creates clear and distinct boundaries for network, security, scalability, durability, and reliability guarantees of the solution. Each tier can be independently secured and scaled based on the nature of the tier's workload, data privacy, and computational hardware characteristics.

The three infrastructure tiers host various components and services, have specific roles, and establish a clear separation of the following concerns:

  • Control
  • Data
  • Intelligence
  • Model/Artifacts
  • Communication

Let's examine the characteristics of each tier in more detail and understand how a tiered event-driven architecture addresses these concerns.

Things Tier

The Things Tier hosts the Perception components. The sensors and actuators in this tier serve as the primary interface to the physical world. Components in this tier sense the physical environment, digitize the signal, process and transmit it to the rest of the tiers. The Things Tier is comprised of constrained edge devices and is architected to meet the following requirements and operational constraints:

Role and Responsibilities

  • Interface with the sensors and digitize the analog signals
  • Preprocesses data using DSP filters
  • Perform closed-loop inferences
  • Interface with actuators
  • Provide protocol gateway services for sensor nodes to gateway communication
  • Provide IoT gateway services for communication with the outside world
  • Package, normalize, aggregate, and transmit data using lightweight messaging protocols.
  • Response to command messages and perform operations such as triggering a model OTA download
  • Minimize data loss
  • Ensure low latency between inference and actuation

Operating environment

  • Microcontroller, SoC
  • 8, 16, or 32 bit architecture
  • RTOS or Super Loop
  • Sensor or mote nodes


  • Low power consumption computational workloads
  • Limited on-device memory and storage
  • No scalability options
  • No file system
  • Power consumption - Peak milliwatts to microwatts, quiescent nanowatts
  • Power source - Battery, solar, or harvested
  • No on-board thermal management


  • Wireless sensor networks between simple sensors nodes and the gateway
  • Star, tree, or mesh topologies
  • Use of low power and bandwidth IoT protocols such BLE, LoRa, or Zigbee
  • Limited bandwidth and intermittent connectivity


  • Gateway initiated connections to the outside world with asymmetric key cryptography
  • Strict device identity and encryption using on-chip secure cryptoprocessors such as Trusted Platform Module (TPM)

Inference Tier

The inference tier hosts the Cognition services that analyze data coming from the Things Tier and generate real-time actionable insights and alerts. This tier is architected to meet the following requirements and operational constraints:

Role and Responsibility

  • Respond to command events from the MLOps layer
  • Download the latest ML models in response to command events
  • Subscribe to various context enrichment event streams
  • Perform context specific inferences
  • Generate insights using event stream processing
  • Synthesize higher-order alert events by integrating inferences with events stream processing insights
  • Maximize data timeliness

Operating environment

  • Embedded Microprocessor or Single-board Computers
  • ARM architecture
  • Embedded Linux or RTOS operating systems


  • Moderately intensive computational workloads
  • Power consumption - Peak milliwatts, quiescent microwatts
  • Power source - Battery or external power supply
  • Passive thermal management such as heat sink


  • Moderate bandwidth and throughput


  • Data in-transit secured using mutual TLS
  • No data at rest is allowed on this tier

Platform Tier

The platform tier hosts two categories of services - MLOps and Platform Services. It logically partitions training-related activities from platform services, enabling computationally intensive training jobs to run on dedicated AI accelerated devices. This tier is architected to meet the following requirements and operational constraints:

Role and responsibilities - MLOps Layer

  • Provide mechanisms to express MLOps workflows, pipelines, and dependencies as Directed acrylic graphs (DAG)
  • Provide mechanisms to declaratively define AI accelerator aware workload placement strategies
  • Orchestrate MLOps pipelines for data collection, processing, validation, and training
  • Provide continuous deployment capabilities for embedded ML models
  • Produce command events to orchestrate various model deployment and training activities
  • Ingest streaming data, normalize and create training data
  • Detect drift in the models
  • Compress models and store them in the artifacts registry
  • Provide MLOps dashboard services
  • Maximize data quality

Role and responsibilities - Platform Service Layer

  • Coordinate workload orchestration with the local Control Agents
  • Manage deployment and monitoring of containerized workload and services
  • Enable lightweight messaging to communicate with the IoT devices
  • Provide durable and reliable event streaming services
  • Bridge the messaging and streaming protocols
  • Provide private container registry services
  • Provide artifacts repository, metadata, and training datastore services
  • Store and serve quantized models
  • Provide embedded ML model over the air (Model OTA) services

Operating environment

  • Single-board Computers with AI Acceleration such as GPU or TPU
  • ARM or x86 architecture
  • Embedded Linux operating system


  • IOPS intensive workloads
  • Large high throughput storage
  • Shared file system
  • Computation and memory intensive workloads
  • Large on-device memory
  • Active thermal management such as conductive or peltier cooling

Network and Communication

  • High bandwidth and throughput


  • Data in-transit secured using mutual TLS
  • Encrypt data at rest


In this article, we explored the AIoT problem landscape, the emergent behaviors, and architecturally significant use cases. We saw how using a tiered event driven architecture and employing AIoT patterns in a reference architecture, we can achieve a clean separation of concerns, address emergent behaviors and manage the ensuing complexity.

In part 2 of this series, we will see how to build a concrete infrastructure implementation of this reference architecture that is capable of hosting a real-world AIoT application.