NECSTSpecial Talk - A High-performance FPGA Hardware Design for Approximate Sparse Embedding Similarity

NECSTSpecialTalk
Alberto Parravicini
DEIB PhD Student
Event will be online from Facebook
November 29th, 2021
1.00 pm
Contacts:
Marco Santambrogio
Research Line:
System architectures
Alberto Parravicini
DEIB PhD Student
Event will be online from Facebook
November 29th, 2021
1.00 pm
Contacts:
Marco Santambrogio
Research Line:
System architectures
Sommario
On November 29th, 2021 at 1.00 pm, a new appointment of NECSTSpecialTalk titled "A High-performance FPGA Hardware Design for Approximate Sparse Embedding Similarity" will be held online via Facebook by Alberto Parravicini, PhD student in Information Technology al Politecnico di Milano.
Computing quickly and efficiently the similarity of sparse embeddings is a critical step of many modern recommender systems, to propose music to listen to, videos to watch, products to buy. Top-K Sparse matrix-vector multiplication (SpMV) is a common workhorse behind these computations, although its performance on general-purpose NUMA systems with traditional caching strategies is lackluster at best. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. In this talk, we introduce a Top-K SpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power efficiency.
Computing quickly and efficiently the similarity of sparse embeddings is a critical step of many modern recommender systems, to propose music to listen to, videos to watch, products to buy. Top-K Sparse matrix-vector multiplication (SpMV) is a common workhorse behind these computations, although its performance on general-purpose NUMA systems with traditional caching strategies is lackluster at best. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. In this talk, we introduce a Top-K SpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power efficiency.
The NECSTLab is a DEIB laboratory, with different research lines on advanced topics in computing systems: from architectural characteristics, to hardware-software codesign methodologies, to security and dependability issues of complex system architectures.
Streaming via Facebook will be available at the following link