Skip to content

Search is only available in production builds. Try building and previewing the site to test it out locally.

2.2 State-of-the-art Hardware

The landscape of edge AI hardware is characterised by a diverse array of specialised processors, each engineered to address the unique computational and power requirements of on-device AI.

Specialised Processors: GPUs, NPUs, FPGAs, and ASICs

Section titled “Specialised Processors: GPUs, NPUs, FPGAs, and ASICs”

Graphics Processing Units (GPUs): While traditionally known for graphics rendering, GPUs are widely utilised in edge AI for their parallel processing capabilities, which are highly effective for deep learning and video analytics workloads. Platforms such as the NVIDIA Jetson series are flagship edge AI computers designed for demanding applications like autonomous robotics, complex computer vision systems, and even generative AI, delivering significant AI performance measured in TOPS (trillions of operations per second). See Figure 3.

Figure 3. One example is NVIDIA development platform. The Jetson Orin Nano Super Developer Kit. See: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/ 

  • Neural Processing Units (NPUs): These are dedicated co-processors specifically designed to accelerate AI tasks, particularly matrix multiplication and tensor operations essential for neural networks, directly on the processor with high efficiency and low power consumption. Examples include Intel AI Boost and AMD XDNA, and specialised chips like the Hailo-8. 
  • Field-Programmable Gate Arrays (FPGAs): FPGAs offer a unique blend of flexibility, in-field upgradability, and parallel processing capabilities, making them an essential tool for contextual edge AI. Their reprogrammable logic fabric resembles neural wiring, making them excellent targets for neural networks. FPGAs can be configured to perform specific AI tasks, allowing developers to adapt applications and optimise for maximum efficiency and reliability. They are also effective in accelerating data ingestion and overcoming I/O bottlenecks. Frameworks like Xilinx (now AMD) Vitis AI provide comprehensive toolkits for FPGA-based edge computing.
  • Application-Specific Integrated Circuits (ASICs): ASICs are custom-made for particular AI workloads, offering superior price-to-performance ratios and exceptional power efficiency for specific, stable inference tasks. They are increasingly gaining market share for AI inference, particularly for large language models (LLMs) at the edge, where they can be simpler, cheaper, and consume less power than general-purpose GPUs for fixed workloads.
  • Digital Signal Processors (DSPs): Modern DSPs are evolving to integrate neural network accelerators, offering significant performance-per-watt advantages for real-time signal processing tasks such as audio analysis, image recognition, and sensor fusion. These ultra-efficient SoCs are ideal for applications requiring a balance of performance, latency, and energy efficiency without cloud reliance. Boards like the Beagle-AI from Texas Instruments integrate DSPs alongside ARM cores, a GPU and vision accelerator. See Figure 4. Figure 4. The BeagleY-AI Single Board Computer (SBC) is a low-cost, open-source, community supported development platform for developers and hobbyists in a form-factor compatible with accessories available for other popular SBCs. It has the ability to run AI applications on a dedicated 4 TOPS co-processor along with real-time I/O tasks on a dedicated 800MHz microcontroller.   See: https://www.beagleboard.org/boards/beagley-ai 

Emerging Architectures: Neuromorphic Computing

Section titled “Emerging Architectures: Neuromorphic Computing”

Inspired by the biological brain, neuromorphic computing is one example that represents a paradigm shift in AI hardware, with chips, such as Intel’s Loihi 2, employ event-driven architectures and spiking neural networks (SNNs). Unlike conventional processors that continuously process data in fixed intervals, neuromorphic chips activate only when specific “spikes” or events occur (see Figure 5 for an example of a neuromorphic camera) leading to unparalleled energy efficiency (consuming as little as 1% to 10% of the power used by traditional processors). This asynchronous processing enables real-time responsiveness, with latencies in the range of tens of microseconds, making them ideal for dynamic, latency-sensitive applications like robotics, autonomous systems, and next-generation IoT devices. Furthermore, neuromorphic systems integrate memory and processing in a single architecture, minimising energy loss from constant data movement, a significant advantage for battery-powered edge devices.

Figure 5. A simple side-by-side demo of a conventional and neuromorphic camera.  A neuromorphic camera, also known as an event camera, silicon retina, or dynamic vision sensor (DVS), is a bio-inspired sensor that captures changes in light intensity rather than full frames at fixed intervals. It mimics the human eye’s ability to detect changes in luminance, rather than capturing static scenes. Please see the following link for a very clear video demonstration: https://www.youtube.com/watch?v=W5mUwBitFtg 

Leading Edge AI Hardware Platforms and Their Capabilities

Section titled “Leading Edge AI Hardware Platforms and Their Capabilities”

The market for edge AI hardware is characterised by a diverse range of platforms, each optimised for distinct power, performance, and form factor requirements. This proliferation of highly specialised AI hardware signifies a strategic shift from general-purpose computing to domain-specific architectures at the edge. This specialisation is crucial for overcoming the inherent resource constraints and achieving the precise performance and energy efficiency demanded by diverse edge AI applications. This implies that hardware selection is no longer a simple choice, but a complex engineering decision that must be coupled with the specific AI workload, power budget, latency requirements, and desired flexibility for future updates. For instance, a high-volume, fixed-function application might prioritise an ASIC for its cost and power advantages, whereas a rapidly evolving or multi-functional system may find FPGAs or powerful GPU-based SoCs more suitable due to their flexibility and adaptability. This trend inherently drives the necessity for close hardware-software co-design from the earliest stages of development. The emergence of specialised chips capable of handling generative AI and large language models at the edge further underscores this trend towards highly tailored hardware solutions for new AI paradigms. Table 1 below provides a comparative overview of some leading edge AI hardware platforms, illustrating their varied capabilities and target applications.

Table 1: Comparison of Some Modern Edge AI Hardware Platforms

Platform NamePrimary Processor TypePerformance (TOPS/TFLOPS/GOPS)Power ConsumptionKey Use Cases/StrengthsNotable Features
NVIDIA Jetson AGX OrinGPUUp to 275 TOPSVaries (e.g., 15-60W)High-end robotics, autonomous vehicles, advanced computer vision, GenAI12-core Arm CPU, 2048-core NVIDIA Ampere GPU, 64 Tensor Cores
Google Coral Dev BoardNPU (Edge TPU)4 TOPSLow (e.g., ~0.5-1W)Low-power vision-based IoT, smart cameras, portable ML devicesOptimised for TensorFlow Lite, small form factor
Intel Neural Compute Stick 2VPU (Movidius Myriad X)~1.2 TOPSLow (USB powered)Prototyping Edge AI on PCs/Raspberry Pi, offloading inference tasksPlug-and-play USB 3.0 device
AMD Xilinx Kria K26 SOMFPGAVaries by configurationModerateComputer vision in industrial/smart city, automated optical inspectionAdaptive, in-field reconfigurability, low latency
Qualcomm Robotics RB5SoC (CPU+GPU+AI Engine)15 TOPSModerateHigh-performance robotics, drones, multi-camera setupsOcta-core Kryo 585 CPU, Adreno 650 GPU, Hexagon Tensor Accelerator
Intel Loihi 2NeuromorphicUp to 50x faster than ANNsExtremely low (1W)Event-driven AI, robotics, autonomous systems, brain-inspired computingCombines processing and memory, asynchronous operation
Concept Match

Match Edge AI Hardware Concepts

Drag each definition into its matching concept slot, then click Submit. Tap × to return a placed card to the pool.

GPU
drag a definition here…
NPU
drag a definition here…
FPGA
drag a definition here…
Neuromorphic
drag a definition here…

Definition Pool

Event-driven architecture using spiking neural networks, inspired by the biological brain.
Reprogrammable logic fabric offering in-field reconfigurability and low-latency performance.
Parallel processing powerhouse for deep learning and video analytics (e.g., NVIDIA Jetson).
Dedicated co-processor specifically designed for matrix multiplication and tensor operations.

The fundamental advantage of Edge AI lies in its capacity to enable real-time decision-making by processing data locally, thereby eliminating the latency associated with transmitting data to and from distant cloud servers. This capability is important for applications where immediate responses are critical, such as autonomous vehicles requiring split-second decisions to prevent accidents, industrial automation systems performing predictive maintenance to avert costly downtime, and healthcare monitoring devices providing instant diagnostics and alerts. Furthermore, the ability of edge AI systems to operate independently, without constant internet connectivity, ensures continuous functionality and enhanced reliability in environments with intermittent or unreliable network access, or during critical events like natural disasters.

To overcome the severe hardware limitations of edge devices (as discussed to include restricted processing power, limited memory, and finite battery life), AI models must undergo aggressive optimisation. This is a complex engineering challenge, given that traditional AI models, particularly deep neural networks, often demand significant computational resources.

Several key techniques are employed to make AI models feasible for edge deployment:

  • Quantisation: This technique reduces the precision of model parameters (weights and activations), typically converting them from 32-bit floating-point numbers to lower-precision formats like 8-bit integers or even lower. This significantly shrinks model size and accelerates computation, as lower-precision arithmetic uses less memory and leverages efficient integer math on microcontrollers. While some precision loss is inherent, techniques like post-training quantisation and quantisation-aware training (QAT) can minimise accuracy degradation.
  • Pruning: This involves systematically removing unnecessary neurons, connections, or filters within a neural network, resulting in a sparser model that requires fewer computations. Techniques range from magnitude pruning (removing weights below a threshold) to structured pruning (eliminating entire neurons or channels), with the latter often offering better hardware acceleration compatibility. Pruned models typically require a fine-tuning step to recover any lost accuracy.
  • Knowledge Distillation: This method involves training a compact “student” model to emulate the behaviour of a larger, more complex “teacher” model. The student model learns to approximate the teacher’s outputs, often achieving a substantial portion of the teacher’s accuracy at a significantly reduced size. This is particularly powerful for TinyML, allowing the benefits of advanced architectures during training while deploying a lean model.

These optimisation techniques, when applied in combination, can yield dramatic reductions in model size and computational complexity while maintaining accuracy. The combined application of these model compression techniques is crucial for achieving the necessary efficiency for edge AI, as individual methods might not suffice. This implies that a holistic optimisation strategy is required, where these techniques are applied to balance accuracy and resource constraints. Empirical evidence, such as studies showing up to 75% reduction in model size with structured pruning and 95% reduction in parameters with dynamic quantisation, demonstrates their effectiveness for real-world edge deployment. This provides strong quantitative support for the synergistic approach to model optimisation, indicating that these techniques are not just about making models fit, but also about making them perform better on target hardware. Table 2 below summarises these key AI model optimisation techniques:

Table 2: AI Model Optimisation Techniques for Edge Deployment

TechniquePurposeHow it WorksImpact on ModelConsiderations
QuantisationReduce model size and accelerate computationReduces precision of weights and activations (e.g., 32-bit float to 8-bit int)Smaller size, faster computation, lower memory/energy usePotential minimal accuracy drop; requires careful implementation
PruningEliminate redundant parameters, reduce computationRemoves unnecessary neurons, connections, or filtersSparser model, fewer computations, smaller sizeOften requires fine-tuning to recover accuracy; structured pruning better for hardware acceleration
Knowledge DistillationTransfer knowledge from large model to small, retain accuracyTrains a compact “student” model to mimic a larger “teacher” model’s outputsSmaller model, often similar accuracy to teacher, lower computational demandsRequires a pre-trained teacher model; can be complex to implement effectively

Practical Applications and Use Cases of Edge AI

Section titled “Practical Applications and Use Cases of Edge AI”

Edge AI is transforming various industries by enabling intelligent, real-time decision making directly on devices. For example (as illustrated in Figure 6):

  • Autonomous Systems: Edge AI is critical for autonomous vehicles, processing massive volumes of sensor data (e.g., 4-19 TB/hour) in real-time for obstacle detection, navigation, and safety, mitigating risks associated with cloud connectivity delays. Similarly, autonomous drones and industrial robots leverage edge AI for obstacle avoidance and enhanced motion capabilities.
  • Smart Environments: In smart homes, edge AI powers voice and vision recognition, and adjusts parameters like lighting or heating based on learned user patterns, optimising energy consumption. Smart cities utilise edge-driven traffic monitoring for real-time flow optimisation and public safety cameras for on-site object and behaviour detection. In agriculture, TinyML applications on cattle wearables can track vital signs to predict disease outbreaks.
  • Industrial IoT: Edge AI boosts efficiency and predictive capabilities through real-time quality control on production lines via computer vision, predictive maintenance to avoid costly downtimes by learning vibration signatures, and energy consumption optimisation in factories.
  • Healthcare: Wearable devices use edge AI for real-time monitoring of patient vitals and fall detection, providing instant feedback and alerts. Implantable devices can analyse biometric data to alert doctors to emergencies, and diagnostic AI models can be trained collaboratively via federated learning while preserving patient privacy.
  • Security: Edge computing enables autonomous Security Operations Centres (SOCs) to monitor real-time events, providing quicker decisions and immediate action against threats by analysing data locally. This includes secure local processing of sensitive surveillance data for defence operations.

Figure 6. Visual representation of different Edge AI use cases.

The ability of AI to enable adaptive control systems and predictive analytics directly on embedded devices represents a fundamental shift from reactive to proactive and autonomous embedded systems. This not only enhances efficiency and safety but also unlocks entirely new business models and revenue streams. This indicates that organisations adopting edge AI are not just implementing a new technology but are fundamentally transforming their operations and offerings. The shift from static programming to dynamic learning systems requires a new mindset in embedded software engineering, emphasising data-driven development and continuous model refinement, which ultimately drives competitive advantage and new monetisation opportunities.

Concept Match

Match AI Model Optimisation Techniques

Drag each definition into its matching concept slot, then click Submit. Tap × to return a placed card to the pool.

Quantisation
drag a definition here…
Pruning
drag a definition here…
Knowledge Distillation
drag a definition here…
Fine-tuning
drag a definition here…

Definition Pool

A recovery step often required after pruning or quantisation to regain lost model accuracy.
Reduces precision of model parameters (e.g., 32-bit float to 8-bit int) to shrink size and speed up math.
Systematically removes unnecessary neurons or connections to create a sparser, more efficient model.
Trains a compact 'student' model to emulate the behaviour and accuracy of a larger 'teacher' model.