2.1 Edge Programming Introduction

Introduction

Edge programming on embedded systems is a rapidly evolving field, driven by the increasing demand for real-time data processing, reduced latency, and enhanced privacy and security in applications ranging from IoT devices to autonomous vehicles. Some examples of edge programming applications include:

Autonomous Vehicles: Self-driving cars require extremely low-latency decision-making to safely navigate roads, identify obstacles, interpret traffic signs, and respond to unpredictable events. Relying on cloud servers would introduce dangerous delays and risk in connectivity loss. Therefore, edge processing is essential, using onboard computing platforms such as NVIDIA Jetson or custom hardware to process visual and sensor data locally in real time.
Industrial Automation in Smart Factories (Industry 4.0): Modern manufacturing plants use edge processing to ensure reliability and responsiveness in automated systems. Machines equipped with vibration, temperature, and acoustic sensors can detect anomalies like overheating or mechanical failure instantly. Edge devices such as Programmable Logic Controllers (PLCs) or rugged industrial gateways process this sensor data locally to maintain uptime, enforce safety constraints, and support predictive maintenance, all without relying on external connectivity or risking intellectual property exposure.
Remote Healthcare Monitoring: Wearable medical devices and home health systems frequently use edge processing to analyse biometric data such as heart rate, ECG, blood glucose, or fall detection in real time. These systems often operate in settings with intermittent connectivity and must ensure patient privacy under regulations like GDPR ( General Data Protection Regulation). Devices such as smartwatches or embedded health monitors can perform on-device analysis to trigger alerts or initiate local responses without streaming raw data to the cloud — e.g., identify a possible heart attack.

Many applications necessitate real-time decision-making with privacy safeguards. This often involves creating program code specifically optimised for hardware devices at the network edge to facilitate these decisions. This might mean we need to write code that ‘understands’ the hardware on which it is deployed and might also mean that the code is heavily optimised for that device, for example, to greatly extend the battery life of the device.

The Evolving Landscape of Embedded Intelligence

Embedded systems are specialised computing platforms designed to perform one or a few dedicated functions within a larger mechanical, mechatronics or electrical system. Unlike general-purpose computers, these systems are purpose-built, with their software intricately tied to specific hardware specifications and operational constraints. In the modern technological landscape, embedded systems function as the “invisible drivers” behind a myriad of smart devices and industrial machinery, integrated into everyday products to enhance efficiency, safety, and user experiences. These systems are fundamentally composed of both hardware, including processing units, memory, and input/output interfaces, and software, often featuring a Real-Time Operating System (RTOS) that dictates their operational logic.

The recent growth of data generated at the periphery of networks, particularly from Internet of Things (IoT) devices, has necessitated a fundamental shift in computational paradigms. This shift has given rise to edge computing, a distributed framework that brings computation and data storage geographically closer to the data sources or end-users. This approach stands in stark contrast to traditional cloud computing models, where data is transmitted to centralised data centres for processing, often incurring significant latency and bandwidth overhead. The primary drivers for adopting edge computing are varied, including the requirement to lower latency, reduce bandwidth consumption, effect network reliability, enhance data privacy, and optimise costs associated with data transmission.

The example in Figure 1 below links to an example of a real-time edge device that is capable of making very sophisticated real-time decisions. Click the YouTube video link in the caption for a live video demonstration.

Figure 1. Boston Dynamics Atlas Robot (https://www.youtube.com/watch?v=I44_zbEwz_w) The Boston Dynamics Atlas robot is a highly agile, bipedal humanoid robot designed for dynamic movement and advanced mobility tasks. It can walk, run, jump, and perform complex parkour routines using real-time perception and control.

The convergence of artificial intelligence (AI) with edge computing has culminated in what is known as Edge AI, a pivotal technological advancement. Edge AI empowers machine learning models to execute directly on local edge devices, enabling real-time data processing and analysis even without constant cloud connectivity. This capability facilitates near-instantaneous feedback, often within milliseconds, which is critical for time-sensitive applications.

The increasing computational power of IoT devices, coupled with the unprecedented volumes of data they generate, has outpaced the capabilities of traditional network infrastructures, making Edge AI an indispensable solution for extracting actionable intelligence efficiently. The evolution of embedded systems is thus not merely an incremental improvement but a huge transformation from isolated, task-specific devices to interconnected, intelligent, and often autonomous agents. This progression is fundamentally driven by the sheer scale of data generated at the network’s periphery and the critical demand for real-time, low-latency decision-making. This indicates a fundamental architectural shift in how computing resources are distributed and managed globally, moving towards a massively distributed network that provides “computing as a service” and promises to revolutionise various industries, from healthcare to transportation.

Core Concepts of Edge Programming

Fundamentals of Embedded Software Development

Embedded software development is inherently distinct from general-purpose software engineering due to its purpose-built nature, meticulously tailored for specific hardware and functions. A defining characteristic is its operation within stringent real-time constraints, where the precise timing of operations is as critical as their functional correctness. This necessitates a focus on efficiency and performance optimisation, including minimal memory footprint, highly efficient code execution, and low power consumption, given that embedded systems often operate with limited resources. Reliability and stability are paramount, as system failures in many embedded applications, such as medical devices or automotive systems, are simply not permissible.

Programming languages such as C and C++ remain widely adopted due to their low-level control, enabling direct interaction with hardware and efficient memory management. C++ is also an important language, offering a blend of efficiency, performance, and flexibility, making it well-suited for resource-constrained embedded systems that require complex and maintainable code. The ability to work closely with hardware at an assembly level, coupled with optimised data structures, allows for the execution of critical tasks in real-time. A core aspect of hardware-software co-design involves leveraging low-level hardware features such as Direct Memory Access (DMA) and Hardware Interrupts. DMA allows peripheral devices (like sensors or network controllers) to transfer data directly to memory without constant CPU intervention, significantly reducing overhead. Similarly, well-designed interrupt-driven architectures ensure the processor responds only to critical events, enabling the system to remain in ultra-low-power sleep states for the majority of its operation. As should be clear from this module’s contents, rust is an emerging language in this space that focuses on the safety and robustness of the deployed applications.

Edge Computing Architectures

Edge computing encompasses a wide range of devices, ranging from basic IoT sensors to powerful edge servers, all processing data at or near the network’s edge. A prevalent architectural model for complex edge deployments is a three-layered framework: edge devices, edge servers, and cloud platforms, as illustrated in Figure 2.

Edge devices serve as the initial point of interaction, performing real-time inferencing directly at the data source. These are also often called leaf nodes or endpoint devices.
Edge servers act as computational intermediaries, aggregating data from multiple devices and executing more complex AI workloads than individual edge devices can handle. Smart edge devices running embedded Linux (e.g., Raspberry Pi/BeagleBone) are often used for this task.
The cloud, at the highest layer, provides centralised analytics, deeper data analysis, and long-term storage. This hybrid architecture strategically distributes computational tasks, leveraging the edge for low latency and reduced bandwidth, while utilising the cloud for comprehensive insights and large-scale model training

Figure 2. Generalised description of the role of edge devices, edge servers (computers) and the Cloud in Edge Computing (Adapted from: https://www.akamai.com/glossary/what-is-edge-computing)

A critical aspect of the edge computing paradigm is the nuanced distinction between “edge” and “endpoint” devices in much of the literature. While both fall under the umbrella of edge computing, their operational constraints and capabilities differ significantly.

Endpoint devices, such as embedded cameras or industrial sensors, are typically designed for ultra-low-power AI inferencing, operating with severe resource limitations.
In contrast, more powerful edge servers are deployed as intermediaries, capable of handling heavier AI workloads before relaying data to the cloud. These can be small embedded devices (e.g., Raspberry Pi) or very capable server class devices (e.g., blade server) depending on the application.

This differentiation highlights that a universal programming approach for the “edge” is often inadequate. Developers must possess an understanding of the specific tier of edge device they are targeting, whether it is a microcontroller, a specialised AI accelerator, or a robust edge server (to make informed decisions regarding hardware selection, optimisation techniques, and software frameworks). This directly influences the feasibility of deploying certain AI models and the overall system architecture, underscoring the need for flexible and adaptive software and hardware co-design tailored to specific resource constraints and performance requirements.

Quiz

Select 0/1

In the three-layer edge architecture, which tier aggregates data from multiple endpoint devices and handles AI workloads too demanding for individual endpoint hardware?

derekmolloy.ie

Edge AI: Bringing Machine Learning to the Device

Edge AI represents the integration of machine learning capabilities directly into edge devices, enabling real-time data processing and analysis without continuous reliance on cloud infrastructure. This paradigm offers compelling advantages, including dramatically improved latency for real-time responses, enhanced data privacy by keeping sensitive information on the device, and significant reductions in bandwidth usage and energy consumption by minimising data transmission to the cloud.

The applications of Edge AI are vast and rapidly expanding. They include autonomous vehicles making split-second decisions, smart home appliances with integrated voice and vision recognition, and industrial sensors performing predictive maintenance directly on the factory floor. For instance, a predictive maintenance node can run a small neural network on the device to detect anomalies in motor vibration, alerting maintenance teams without requiring an internet connection. Similarly, a wearable device can perform real-time fall detection using accelerometer data locally. This on-device intelligence is transforming various sectors by enabling faster, more secure, and more efficient operations.

As Edge AI systems transition from development to long-term deployment, they face the additional challenge of Model Drift. This occurs when the real-world data distribution changes over time (e.g., a sensor degrading or seasonal lighting changes), causing the model’s accuracy to decline. Robust edge applications must include local monitoring mechanisms to detect these performance drops and trigger alerts or request model retraining, often through a hybrid edge-cloud feedback loop.

Edge Computing: This distributed computing paradigm brings computation and data storage closer to the data source (the “edge” of the network), minimising the need to send all data to a centralised cloud. This approach moves computation and storage as close as possible to the data source, reducing latency, network load, and privacy risk.
Embedded Systems: These are specialised computer systems designed for specific functions within a larger mechanical, mechatronics or electrical system, often with real-time constraints and limited resources. They typically consist of microcontrollers (MCUs) or microprocessors (MPUs), memory, I/O interfaces, sensors, and actuators.
Real-time Processing: A fundamental requirement for many edge applications, ensuring that data is processed and decisions are made with minimal delay. This is crucial for applications like industrial control, autonomous systems, and medical devices.
Decentralisation: Shifting from a centralised cloud model to a distributed network of edge devices for processing and decision-making.
Local Processing: Processing data directly on the edge device, reducing reliance on constant cloud connectivity and improving responsiveness.

Optimising AI Models for Edge Deployment

To overcome the severe hardware limitations of edge devices — restricted processing power, limited memory, and finite battery life — AI models must undergo aggressive optimisation before deployment. Traditional deep neural networks demand significant computational resources; the techniques below make them feasible for edge hardware.

Quantisation: Reduces the precision of model parameters (weights and activations), typically converting them from 32-bit floating-point numbers to lower-precision formats such as 8-bit integers. This significantly shrinks model size and accelerates computation, as lower-precision arithmetic uses less memory and leverages efficient integer math on microcontrollers. Post-training quantisation and quantisation-aware training (QAT) can minimise accuracy degradation.
Pruning: Systematically removes unnecessary neurons, connections, or filters within a neural network, producing a sparser model that requires fewer computations. Structured pruning (eliminating entire channels or layers) often provides better hardware acceleration compatibility than unstructured pruning. Pruned models typically require a fine-tuning step to recover any lost accuracy.
Knowledge Distillation: Trains a compact “student” model to emulate the behaviour of a larger, more complex “teacher” model. The student learns to approximate the teacher’s outputs, achieving a substantial portion of the teacher’s accuracy at a significantly reduced size — a particularly effective approach for TinyML.

These techniques are most powerful when applied in combination. Studies have demonstrated up to a 75% reduction in model size with structured pruning and a 95% reduction in parameters with dynamic quantisation. Table 1 below summarises these key optimisation techniques.

Table 1: AI Model Optimisation Techniques for Edge Deployment

Technique	Purpose	How it Works	Impact on Model	Considerations
Quantisation	Reduce model size and accelerate computation	Reduces precision of weights/activations (e.g., 32-bit float to 8-bit int)	Smaller size, faster computation, lower memory/energy use	Potential accuracy drop; mitigated by QAT
Pruning	Eliminate redundant parameters, reduce computation	Removes unnecessary neurons, connections, or filters	Sparser model, fewer computations, smaller size	Often requires fine-tuning to recover accuracy; structured pruning better for hardware acceleration
Knowledge Distillation	Transfer knowledge from a large model to a small one	Trains a compact “student” to mimic a “teacher” model’s outputs	Smaller model, similar accuracy to teacher	Requires a pre-trained teacher; can be complex to implement

Fine-tuning is a recovery step applied after quantisation or pruning. Because both techniques introduce accuracy degradation, fine-tuning continues training the compressed model on a representative dataset — adjusting the remaining weights to recover what was lost. It is substantially less expensive than training from scratch, since the compressed model already encodes a useful approximation of the target task. Fine-tuning is often the step that determines whether a compressed model is production-ready.

Concept Match

Match AI Model Optimisation Techniques

derekmolloy.ie

Practical Applications and Use Cases of Edge AI

Edge AI is transforming a wide range of industries by enabling intelligent, real-time decision-making directly on devices, as illustrated in Figure 3:

Autonomous Systems: Edge AI is critical for autonomous vehicles, processing massive volumes of sensor data (4–19 TB/hour) in real time for obstacle detection, navigation, and safety. Autonomous drones and industrial robots similarly leverage edge AI for obstacle avoidance and dynamic motion control.
Smart Environments: In smart homes, edge AI powers voice and vision recognition, adjusting parameters such as lighting or heating based on learned user patterns. Smart cities use edge-driven traffic monitoring for real-time flow optimisation. In agriculture, TinyML applications on cattle wearables can track vital signs to predict disease outbreaks.
Industrial IoT: Edge AI boosts production efficiency through real-time quality control on assembly lines via computer vision, predictive maintenance by learning vibration signatures of machinery, and energy consumption optimisation across factory operations.
Healthcare: Wearable devices use edge AI for real-time monitoring of patient vitals and fall detection. Implantable devices can analyse biometric data to alert clinicians to emergencies. Federated learning enables diagnostic AI to be trained collaboratively across hospitals while preserving patient privacy.
Security: Edge computing enables autonomous Security Operations Centres (SOCs) to monitor real-time events and respond immediately to threats by analysing data locally, including sensitive surveillance data for defence applications.

Figure 3. Visual representation of different Edge AI use cases.

Resource Challenges

Developing and deploying solutions for edge embedded systems presents a unique set of resource challenges that necessitate innovative engineering approaches.

Computational and Memory Constraints

Edge devices are inherently resource-constrained, typically possessing limited processing power, memory (often measured in kilobytes rather than gigabytes), and battery life. This necessitates aggressive optimisation of AI models, as complex neural networks designed for cloud servers are generally impractical for direct deployment. For TinyML devices, these constraints are even more pronounced, with memory often in the hundreds of kilobytes and processors operating at megahertz speeds. Specific challenges include “catastrophic forgetting,” where a model may lose previously learned information while attempting to learn new data due to limited memory, and the volatility of SRAM, which means any training progress on the device is lost upon power-off or reset. Furthermore, the compact form factors of these devices lead to thermal constraints, which can limit sustained processing and force trade-offs between computational speed and system reliability.

Power Consumption and Energy Efficiency

Workloads, such as AI processing, are computationally intensive and consume substantial energy, posing significant challenges for battery-powered embedded systems like drones or wearables. A critical design consideration is balancing the need for high-performance AI computations with the imperative for extended battery life and reduced overall energy consumption. To address this, various power management techniques are employed, including implementing sleep modes to minimise power usage during idle times, utilising dynamic voltage and frequency scaling (DVFS) to adjust power according to workload, and carefully selecting low-power microcontrollers, memory types, and energy-efficient sensors. The variability in power consumption across diverse TinyML devices also complicates standardised benchmarking, making it difficult to maintain consistent accuracy evaluations.

Beyond individual device longevity, the industry is increasingly focused on Sustainability and Green AI. This involves not only minimising the energy consumption of AI inference but also considering the entire lifecycle of edge hardware, from carbon-neutral manufacturing to the reduction of electronic waste (e-waste) through modular designs and longer support lifecycles for deployed firmware.

Connectivity and Network Reliability

Edge locations frequently operate in environments characterised by unpredictable, intermittent, or bandwidth-constrained network links, leading to issues such as packet loss, jitter, and variable bandwidth. Remote and disparate deployment sites often have limited or unreliable connectivity options. This necessitates a fundamental shift in architectural design, moving away from continuous reliance on cloud connectivity. Applications must be engineered to gracefully handle extended periods of disconnection (lasting minutes, hours, or even days). Achieving resilience in such conditions demands diverse connectivity options, including wired Ethernet, Wi-Fi, 5G, and even satellite links, often combined intelligently to mitigate single points of failure. An “offline-first” application design, incorporating local data caching and processing, may be important to ensure continuity of critical operations even when network access is unavailable.

Security and Privacy Implications

The distributed nature of edge networks significantly expands the potential attack surface, requiring robust and comprehensive security strategies. Edge devices frequently process and store sensitive data locally, such as voice recordings or health metrics, making them attractive targets for cyberattacks. Challenges include the difficulty of implementing sophisticated security measures on resource-constrained hardware, the vulnerability to physical tampering in geographically dispersed locations, and the emergence of new attack vectors relevant in distributed training paradigms like federated learning. Solutions involve integrating hardware-based security features (e.g., Trusted Platform Modules and secure boot mechanisms), deploying AI-driven anomaly detection systems, ensuring end-to-end encryption for data in transit and at rest, and employing privacy-enhancing technologies such as differential privacy and homomorphic encryption.

Managing Heterogeneity and Complexity

Embedded systems and edge devices exhibit immense heterogeneity across various dimensions, including power profiles, computational capabilities, memory capacities, and interconnect technologies. This inherent diversity poses significant challenges for standardisation and consistent deployment across a large fleet of devices. Designing and integrating intelligent edge systems requires multidisciplinary expertise and the careful balancing of performance with interoperability across disparate components. Furthermore, integrating modern edge computing solutions with existing legacy industrial equipment, which was not originally designed for contemporary communication protocols, presents substantial integration hurdles.

The interplay between resource constraints and the operational environment creates a complex multi-dimensional optimisation problem for edge embedded systems. The limitations in computational power, memory, and battery life, coupled with the demands of operating in harsh, remote, or intermittently connected environments, are not isolated challenges; they are deeply interconnected. For instance, the need for complex AI models on-device directly impacts power consumption and thermal management. Similarly, unreliable network connectivity necessitates greater on-device processing capabilities to ensure continuity, which in turn places higher demands on local compute resources. This creates a continuous design challenge where optimising one aspect often leads to compromises in another. For example, achieving real-time responsiveness might require more powerful hardware, thereby increasing cost and power requirements. Similarly, implementing robust security features can strain already limited hardware resources. Therefore, successful edge programming extends beyond meeting individual requirements; it demands finding the optimal balance across these competing constraints for a given application. This inherently necessitates a holistic, co-design approach from the initial stages of system development.

Summary of Key Resource Challenges in Edge Programming:

Edge devices are inherently resource-constrained, presenting significant challenges:

Limited Compute Power: Edge devices often have less powerful CPUs, MPUs, GPUs, or specialised AI accelerators compared to desktop or cloud servers.
Memory Constraints: RAM and persistent storage are typically limited, requiring careful memory management and optimised code.
Power Consumption: Many edge devices are battery-powered, necessitating highly efficient power management techniques (e.g., sleepy low-power modes, external wake circuitry, and dynamic voltage and frequency scaling).
Network Reliability and Bandwidth: Edge nodes can have unreliable or slow network connections, requiring robust communication protocols and offline capabilities.
Heterogeneity: Diverse hardware architectures and communication protocols from multiple vendors across different edge devices complicate development and deployment.
Security: Decentralised architectures introduce new trust management challenges, requiring secure communication, access control, and robust threat detection mechanisms.

🧩Knowledge Check

Quiz

Select 0/3

What are some of the primary computational and memory constraints mentioned for TinyML devices?

derekmolloy.ie

Quiz

Select 0/3

Which power management techniques are commonly used to extend battery life in edge devices?

derekmolloy.ie

Quiz

Select 0/2

What does an 'offline-first' application design imply for edge devices?

derekmolloy.ie