2.2 State-of-the-art Hardware

State-of-the-art Hardware
Section titled “State-of-the-art Hardware”The landscape of edge AI hardware is characterised by a diverse array of specialised processors, each engineered to address the unique computational and power requirements of on-device AI.
Specialised Processors: GPUs, NPUs, FPGAs, and ASICs
Section titled “Specialised Processors: GPUs, NPUs, FPGAs, and ASICs”Graphics Processing Units (GPUs): While traditionally known for graphics rendering, GPUs are widely utilised in edge AI for their parallel processing capabilities, which are highly effective for deep learning and video analytics workloads. Platforms such as the NVIDIA Jetson series are flagship edge AI computers designed for demanding applications like autonomous robotics, complex computer vision systems, and even generative AI, delivering significant AI performance measured in TOPS (trillions of operations per second). See Figure 3.

- Neural Processing Units (NPUs): These are dedicated co-processors specifically designed to accelerate AI tasks, particularly matrix multiplication and tensor operations essential for neural networks, directly on the processor with high efficiency and low power consumption. Examples include Intel AI Boost and AMD XDNA, and specialised chips like the Hailo-8.
- Field-Programmable Gate Arrays (FPGAs): FPGAs offer a unique blend of flexibility, in-field upgradability, and parallel processing capabilities, making them an essential tool for contextual edge AI. Their reprogrammable logic fabric resembles neural wiring, making them excellent targets for neural networks. FPGAs can be configured to perform specific AI tasks, allowing developers to adapt applications and optimise for maximum efficiency and reliability. They are also effective in accelerating data ingestion and overcoming I/O bottlenecks. Frameworks like Xilinx (now AMD) Vitis AI provide comprehensive toolkits for FPGA-based edge computing.
- Application-Specific Integrated Circuits (ASICs): ASICs are custom-made for particular AI workloads, offering superior price-to-performance ratios and exceptional power efficiency for specific, stable inference tasks. They are increasingly gaining market share for AI inference, particularly for large language models (LLMs) at the edge, where they can be simpler, cheaper, and consume less power than general-purpose GPUs for fixed workloads.
- Digital Signal Processors (DSPs): Modern DSPs are evolving to integrate neural network accelerators, offering significant performance-per-watt advantages for real-time signal processing tasks such as audio analysis, image recognition, and sensor fusion. These ultra-efficient SoCs are ideal for applications requiring a balance of performance, latency, and energy efficiency without cloud reliance. Boards like the Beagle-AI from Texas Instruments integrate DSPs alongside ARM cores, a GPU and vision accelerator. See Figure 4.
Figure 4. The BeagleY-AI Single Board Computer (SBC) is a low-cost, open-source, community supported development platform for developers and hobbyists in a form-factor compatible with accessories available for other popular SBCs. It has the ability to run AI applications on a dedicated 4 TOPS co-processor along with real-time I/O tasks on a dedicated 800MHz microcontroller. See: https://www.beagleboard.org/boards/beagley-ai
Emerging Architectures: Neuromorphic Computing
Section titled “Emerging Architectures: Neuromorphic Computing”Inspired by the biological brain, neuromorphic computing is one example that represents a paradigm shift in AI hardware, with chips, such as Intel’s Loihi 2, employ event-driven architectures and spiking neural networks (SNNs). Unlike conventional processors that continuously process data in fixed intervals, neuromorphic chips activate only when specific “spikes” or events occur (see Figure 5 for an example of a neuromorphic camera) leading to unparalleled energy efficiency (consuming as little as 1% to 10% of the power used by traditional processors). This asynchronous processing enables real-time responsiveness, with latencies in the range of tens of microseconds, making them ideal for dynamic, latency-sensitive applications like robotics, autonomous systems, and next-generation IoT devices. Furthermore, neuromorphic systems integrate memory and processing in a single architecture, minimising energy loss from constant data movement, a significant advantage for battery-powered edge devices.

Leading Edge AI Hardware Platforms and Their Capabilities
Section titled “Leading Edge AI Hardware Platforms and Their Capabilities”The market for edge AI hardware is characterised by a diverse range of platforms, each optimised for distinct power, performance, and form factor requirements. This proliferation of highly specialised AI hardware signifies a strategic shift from general-purpose computing to domain-specific architectures at the edge. This specialisation is crucial for overcoming the inherent resource constraints and achieving the precise performance and energy efficiency demanded by diverse edge AI applications. This implies that hardware selection is no longer a simple choice, but a complex engineering decision that must be coupled with the specific AI workload, power budget, latency requirements, and desired flexibility for future updates. For instance, a high-volume, fixed-function application might prioritise an ASIC for its cost and power advantages, whereas a rapidly evolving or multi-functional system may find FPGAs or powerful GPU-based SoCs more suitable due to their flexibility and adaptability. This trend inherently drives the necessity for close hardware-software co-design from the earliest stages of development. The emergence of specialised chips capable of handling generative AI and large language models at the edge further underscores this trend towards highly tailored hardware solutions for new AI paradigms. Table 1 below provides a comparative overview of some leading edge AI hardware platforms, illustrating their varied capabilities and target applications.
Table 1: Comparison of Some Modern Edge AI Hardware Platforms
| Platform Name | Primary Processor Type | Performance (TOPS/TFLOPS/GOPS) | Power Consumption | Key Use Cases/Strengths | Notable Features |
|---|---|---|---|---|---|
| NVIDIA Jetson AGX Orin | GPU | Up to 275 TOPS | Varies (e.g., 15-60W) | High-end robotics, autonomous vehicles, advanced computer vision, GenAI | 12-core Arm CPU, 2048-core NVIDIA Ampere GPU, 64 Tensor Cores |
| Google Coral Dev Board | NPU (Edge TPU) | 4 TOPS | Low (e.g., ~0.5-1W) | Low-power vision-based IoT, smart cameras, portable ML devices | Optimised for TensorFlow Lite, small form factor |
| Intel Neural Compute Stick 2 | VPU (Movidius Myriad X) | ~1.2 TOPS | Low (USB powered) | Prototyping Edge AI on PCs/Raspberry Pi, offloading inference tasks | Plug-and-play USB 3.0 device |
| AMD Xilinx Kria K26 SOM | FPGA | Varies by configuration | Moderate | Computer vision in industrial/smart city, automated optical inspection | Adaptive, in-field reconfigurability, low latency |
| Qualcomm Robotics RB5 | SoC (CPU+GPU+AI Engine) | 15 TOPS | Moderate | High-performance robotics, drones, multi-camera setups | Octa-core Kryo 585 CPU, Adreno 650 GPU, Hexagon Tensor Accelerator |
| Intel Loihi 2 | Neuromorphic | Up to 50x faster than ANNs | Extremely low (1W) | Event-driven AI, robotics, autonomous systems, brain-inspired computing | Combines processing and memory, asynchronous operation |
🧩Knowledge Check
Section titled “🧩Knowledge Check”Match Edge AI Hardware Concepts
AI Decisions on Edge Devices
Section titled “AI Decisions on Edge Devices”The fundamental advantage of Edge AI lies in its capacity to enable real-time decision-making by processing data locally, thereby eliminating the latency associated with transmitting data to and from distant cloud servers. This capability is important for applications where immediate responses are critical, such as autonomous vehicles requiring split-second decisions to prevent accidents, industrial automation systems performing predictive maintenance to avert costly downtime, and healthcare monitoring devices providing instant diagnostics and alerts. Furthermore, the ability of edge AI systems to operate independently, without constant internet connectivity, ensures continuous functionality and enhanced reliability in environments with intermittent or unreliable network access, or during critical events like natural disasters.
Optimising AI Models for Edge Deployment
Section titled “Optimising AI Models for Edge Deployment”To overcome the severe hardware limitations of edge devices (as discussed to include restricted processing power, limited memory, and finite battery life), AI models must undergo aggressive optimisation. This is a complex engineering challenge, given that traditional AI models, particularly deep neural networks, often demand significant computational resources.
Several key techniques are employed to make AI models feasible for edge deployment:
- Quantisation: This technique reduces the precision of model parameters (weights and activations), typically converting them from 32-bit floating-point numbers to lower-precision formats like 8-bit integers or even lower. This significantly shrinks model size and accelerates computation, as lower-precision arithmetic uses less memory and leverages efficient integer math on microcontrollers. While some precision loss is inherent, techniques like post-training quantisation and quantisation-aware training (QAT) can minimise accuracy degradation.
- Pruning: This involves systematically removing unnecessary neurons, connections, or filters within a neural network, resulting in a sparser model that requires fewer computations. Techniques range from magnitude pruning (removing weights below a threshold) to structured pruning (eliminating entire neurons or channels), with the latter often offering better hardware acceleration compatibility. Pruned models typically require a fine-tuning step to recover any lost accuracy.
- Knowledge Distillation: This method involves training a compact “student” model to emulate the behaviour of a larger, more complex “teacher” model. The student model learns to approximate the teacher’s outputs, often achieving a substantial portion of the teacher’s accuracy at a significantly reduced size. This is particularly powerful for TinyML, allowing the benefits of advanced architectures during training while deploying a lean model.
These optimisation techniques, when applied in combination, can yield dramatic reductions in model size and computational complexity while maintaining accuracy. The combined application of these model compression techniques is crucial for achieving the necessary efficiency for edge AI, as individual methods might not suffice. This implies that a holistic optimisation strategy is required, where these techniques are applied to balance accuracy and resource constraints. Empirical evidence, such as studies showing up to 75% reduction in model size with structured pruning and 95% reduction in parameters with dynamic quantisation, demonstrates their effectiveness for real-world edge deployment. This provides strong quantitative support for the synergistic approach to model optimisation, indicating that these techniques are not just about making models fit, but also about making them perform better on target hardware. Table 2 below summarises these key AI model optimisation techniques:
Table 2: AI Model Optimisation Techniques for Edge Deployment
| Technique | Purpose | How it Works | Impact on Model | Considerations |
|---|---|---|---|---|
| Quantisation | Reduce model size and accelerate computation | Reduces precision of weights and activations (e.g., 32-bit float to 8-bit int) | Smaller size, faster computation, lower memory/energy use | Potential minimal accuracy drop; requires careful implementation |
| Pruning | Eliminate redundant parameters, reduce computation | Removes unnecessary neurons, connections, or filters | Sparser model, fewer computations, smaller size | Often requires fine-tuning to recover accuracy; structured pruning better for hardware acceleration |
| Knowledge Distillation | Transfer knowledge from large model to small, retain accuracy | Trains a compact “student” model to mimic a larger “teacher” model’s outputs | Smaller model, often similar accuracy to teacher, lower computational demands | Requires a pre-trained teacher model; can be complex to implement effectively |
Practical Applications and Use Cases of Edge AI
Section titled “Practical Applications and Use Cases of Edge AI”Edge AI is transforming various industries by enabling intelligent, real-time decision making directly on devices. For example (as illustrated in Figure 6):
- Autonomous Systems: Edge AI is critical for autonomous vehicles, processing massive volumes of sensor data (e.g., 4-19 TB/hour) in real-time for obstacle detection, navigation, and safety, mitigating risks associated with cloud connectivity delays. Similarly, autonomous drones and industrial robots leverage edge AI for obstacle avoidance and enhanced motion capabilities.
- Smart Environments: In smart homes, edge AI powers voice and vision recognition, and adjusts parameters like lighting or heating based on learned user patterns, optimising energy consumption. Smart cities utilise edge-driven traffic monitoring for real-time flow optimisation and public safety cameras for on-site object and behaviour detection. In agriculture, TinyML applications on cattle wearables can track vital signs to predict disease outbreaks.
- Industrial IoT: Edge AI boosts efficiency and predictive capabilities through real-time quality control on production lines via computer vision, predictive maintenance to avoid costly downtimes by learning vibration signatures, and energy consumption optimisation in factories.
- Healthcare: Wearable devices use edge AI for real-time monitoring of patient vitals and fall detection, providing instant feedback and alerts. Implantable devices can analyse biometric data to alert doctors to emergencies, and diagnostic AI models can be trained collaboratively via federated learning while preserving patient privacy.
- Security: Edge computing enables autonomous Security Operations Centres (SOCs) to monitor real-time events, providing quicker decisions and immediate action against threats by analysing data locally. This includes secure local processing of sensitive surveillance data for defence operations.

The ability of AI to enable adaptive control systems and predictive analytics directly on embedded devices represents a fundamental shift from reactive to proactive and autonomous embedded systems. This not only enhances efficiency and safety but also unlocks entirely new business models and revenue streams. This indicates that organisations adopting edge AI are not just implementing a new technology but are fundamentally transforming their operations and offerings. The shift from static programming to dynamic learning systems requires a new mindset in embedded software engineering, emphasising data-driven development and continuous model refinement, which ultimately drives competitive advantage and new monetisation opportunities.
🧩Knowledge Check
Section titled “🧩Knowledge Check”Match AI Model Optimisation Techniques
© 2026 Derek Molloy, Dublin City University. All rights reserved.