NECSTFridayTalk – Low-latency Random Forest Inference on embedded devices

Speaker: Alessandro Verosimile
PHD Student in Information Technology
DEIB - NECSTLab Meeting Room (Bld. 20)
Online by Zoom
September 26th, 2025 | 11.30 am
PHD Student in Information Technology
DEIB - NECSTLab Meeting Room (Bld. 20)
Online by Zoom
September 26th, 2025 | 11.30 am
Contact: Prof. Marco Santambrogio
Sommario
On September 26th, 2025 at 11.30 am a new appointment of #NECSTFridayTalk series titled "Low-latency Random Forest Inference on embedded devices" will take place at DEIB NECSTLab Meeting Room (Building 20) and on line by Zoom.
During this talk, we will have, as speaker, Alessandro Verosimile, PhD at Dipartimento di Elettronica, Informazione e Bioingegneria.
The convergence of Artificial Intelligence (AI) and Internet of Things (IoT) is driving the need for real-time, low-latency architectures to trust the inference of complex Machine Learning (ML) models in critical applications like autonomous vehicles and smart healthcare. While traditional cloud-based solutions introduce latency due to the need to transmit data to and from centralized servers, edge computing offers lower response times by processing data locally. In this context, Random Forests (RFs) are highly suited for building hardware accelerators over resource-constrained edge devices due to their inherent parallelism. Nevertheless, maintaining a low latency as the size of the RF grows is still critical for state-of-the-art (SoA) approaches.
To address this challenge, we developed a hardware-software codesign framework for memory-centric RF inference that optimizes the architecture for the target ML model, employing RFs with Decision Trees (DTs) of multiple depths and exploring several architectural variations to find the best-performing configuration. Such framework is enriched by a resource estimation model based on the most relevant architectural features to enable effective Design Space Exploration. Moreover, to further reduce the latency of the architecture, we introduce a stump-wise inference strategy that, unlike literature implementations that process DTs node by node, processes two consecutive layers in parallel. This approach lowers both latency and hardware usage, enabling the inference of larger and more accurate RFs. Lastly, we adapt the original framework to support Oblique Random Forests (ORFs), a variant of traditional RFs that employs hyperplane-based splits, offering a more expressive alternative and improving classification accuracy. We introduce a new training technique for ORFs that mitigates both training complexity and overfitting while improving inference latency, along with a new architectural memory structure that maximizes performance and optimizes resource usage.
During this talk, we will have, as speaker, Alessandro Verosimile, PhD at Dipartimento di Elettronica, Informazione e Bioingegneria.
The convergence of Artificial Intelligence (AI) and Internet of Things (IoT) is driving the need for real-time, low-latency architectures to trust the inference of complex Machine Learning (ML) models in critical applications like autonomous vehicles and smart healthcare. While traditional cloud-based solutions introduce latency due to the need to transmit data to and from centralized servers, edge computing offers lower response times by processing data locally. In this context, Random Forests (RFs) are highly suited for building hardware accelerators over resource-constrained edge devices due to their inherent parallelism. Nevertheless, maintaining a low latency as the size of the RF grows is still critical for state-of-the-art (SoA) approaches.
To address this challenge, we developed a hardware-software codesign framework for memory-centric RF inference that optimizes the architecture for the target ML model, employing RFs with Decision Trees (DTs) of multiple depths and exploring several architectural variations to find the best-performing configuration. Such framework is enriched by a resource estimation model based on the most relevant architectural features to enable effective Design Space Exploration. Moreover, to further reduce the latency of the architecture, we introduce a stump-wise inference strategy that, unlike literature implementations that process DTs node by node, processes two consecutive layers in parallel. This approach lowers both latency and hardware usage, enabling the inference of larger and more accurate RFs. Lastly, we adapt the original framework to support Oblique Random Forests (ORFs), a variant of traditional RFs that employs hyperplane-based splits, offering a more expressive alternative and improving classification accuracy. We introduce a new training technique for ORFs that mitigates both training complexity and overfitting while improving inference latency, along with a new architectural memory structure that maximizes performance and optimizes resource usage.
The NECSTLab is a DEIB laboratory, with different research lines on advanced topics in computing systems: from architectural characteristics, to hardware-software codesign methodologies, to security and dependability issues of complex system architectures.
Every week, the “NECSTFridayTalk” invites researchers, professionals or entrepreneurs to share their work experiences and projects they are implementing in the “Computing Systems”.