Towards Accelerator-Rich Architectures and Systems
Dr. Zhenman Fang
Simon Fraser University - Canada
DEIB - Seminar Room (building 20, ground floor)
October 30th, 2019
12.00 pm
Contacts:
Christian Pilato
Research Line:
System architectures
Simon Fraser University - Canada
DEIB - Seminar Room (building 20, ground floor)
October 30th, 2019
12.00 pm
Contacts:
Christian Pilato
Research Line:
System architectures
Sommario
With Intel’s USD$16.7B acquisition of Altera and the deployment of FPGAs in major cloud service providers including Microsoft, Amazon, Alibaba, and Huawei, we are entering a new era of customized computing. In future architectures and systems, it is anticipated that there will be a sea of heterogeneous accelerators customized for important application domains, such as machine learning, big data analytics, and personalized healthcare, to provide better performance and energy-efficiency. Many research problems are still open, such as how to efficiently integrate accelerators into future processor chips and commodity datacenters, and how to program such accelerator-rich architectures and systems.
In this talk, I will first give a brief overview of my research and explain how customized accelerators can achieve orders-of-magnitude performance improvement. Second, I will present our initial work on CPU-accelerator co-design, where we provide efficient and unified address translation support between CPU cores and accelerators. It shows that a simple two-level TLB design for accelerators plus the host core MMU for accelerator page walking can be very efficient. On average, it achieves 7.6x speedup over the naïve IOMMU and there is only 6.4% performance gap to the ideal address translation. Finally, I will present the open-source Blaze system that provides programming and runtime support to enable easy and efficient FPGA accelerator deployment in datacenters. Blaze abstracts accelerators-as-a-service, and bridges the gap between big data applications (e.g., Apache Spark programs) and emerging accelerators (e.g., FPGAs). By plugging a PCIe-based FPGA board into each CPU server, it can improve the system throughput by several folds for a range of applications.
In this talk, I will first give a brief overview of my research and explain how customized accelerators can achieve orders-of-magnitude performance improvement. Second, I will present our initial work on CPU-accelerator co-design, where we provide efficient and unified address translation support between CPU cores and accelerators. It shows that a simple two-level TLB design for accelerators plus the host core MMU for accelerator page walking can be very efficient. On average, it achieves 7.6x speedup over the naïve IOMMU and there is only 6.4% performance gap to the ideal address translation. Finally, I will present the open-source Blaze system that provides programming and runtime support to enable easy and efficient FPGA accelerator deployment in datacenters. Blaze abstracts accelerators-as-a-service, and bridges the gap between big data applications (e.g., Apache Spark programs) and emerging accelerators (e.g., FPGAs). By plugging a PCIe-based FPGA board into each CPU server, it can improve the system throughput by several folds for a range of applications.
Biografia
Dr. Zhenman Fang is a Tenure-Track Assistant Professor in School of Engineering Science, Simon Fraser University, Canada, where he founded and directs the HiAccel lab. From Sept 2017 to Mar 2019, Zhenman worked in the SDx group at Xilinx San Jose. From Jul 2014 to Sept 2017, Zhenman was a postdoc at UCLA, under the supervision of Prof. Jason Cong and Prof. Glenn Reinman. While at UCLA, he was also a member of two multi-university Centers: Center for Domain-Specific Computing (CDSC) and Center for Future Architectures Research (C-FAR).
Zhenman's recent research focuses on customizable computing with specialized hardware acceleration, which aims to sustain the ever-increasing performance and energy-efficiency demand of important application domains in post-Moore’s law era. It spans the entire computing stack, including emerging application drivers, novel computer architectures, and corresponding programming, runtime, and tool support. Zhenman has published over 20 papers in top conferences and journals, including two best paper awards (TCAD 2019 Donald O. Pederson best paper award and MEMSYS 2017), two best paper nominees (HPCA 2017 and ISPASS 2018), and an invited paper from Proceedings of the IEEE 2019.
Zhenman's recent research focuses on customizable computing with specialized hardware acceleration, which aims to sustain the ever-increasing performance and energy-efficiency demand of important application domains in post-Moore’s law era. It spans the entire computing stack, including emerging application drivers, novel computer architectures, and corresponding programming, runtime, and tool support. Zhenman has published over 20 papers in top conferences and journals, including two best paper awards (TCAD 2019 Donald O. Pederson best paper award and MEMSYS 2017), two best paper nominees (HPCA 2017 and ISPASS 2018), and an invited paper from Proceedings of the IEEE 2019.