As Machine Learning has become a major topic in research and industry, large efforts aim to accelerate ML applications on computing devices like GPUs and FPGAs. While training acceleration has already seen wide research, prediction acceleration is usually limited to single, heavy operators like Neural Networks or Classification Trees. However, many prediction applications, typically structured into pipelines of operators, employ simpler prediction operators which account for a small portion of the execution time. This talk shows that in these - common - scenarios a more systemic approach is needed, with the goal to run entire pipelines or large parts of them on the accelerator device. Starting from an example pipeline, we will motivate our approach, explain our design choices and show the initial results achieved during a research period by Microsoft Corporation in Redmond.
The NECSTLab is a DEIB laboratory, with different research lines on advanced topics in computing systems: from architectural characteristics, to hardware-software codesign methodologies, to security and dependability issues of complex system architectures.
Every week, the "NECST Friday Talk" invites researchers, professionals or entrepreneurs to share their work experiences and projects they are implementing in the "Computing Systems".