NECSTFridayTalk – On Improving Performance and Flexibility of HaplotypeCaller: a GPU Approach
NECSTFridayTalk
Beatrice Branchini
DEIB PhD Student
DEIB - NECSTLab Meeting Room (Building 20)
On Line via Facebook
May 5th, 2023
12.30 pm
Contacts:
Marco Santambrogio
Research Line:
System architectures
Beatrice Branchini
DEIB PhD Student
DEIB - NECSTLab Meeting Room (Building 20)
On Line via Facebook
May 5th, 2023
12.30 pm
Contacts:
Marco Santambrogio
Research Line:
System architectures
Abstract
On May 5th, 2023 at 12.30 pm "On Improving Performance and Flexibility of HaplotypeCaller: a GPU Approach" a new appointment of NECSTFridayTalk, will be held by Beatrice Branchini, PhD Student in Information Technology at NECSTLab, Politecnico di Milano, in DEIB NECSTLab Meeting Room.
Genome Analysis ToolKit HaplotypeCaller is the de-facto standard for highlighting genetic mutations. However, this tool requires long runtimes for real-life datasets, where most of it is spent computing the Pair Hidden Markov Model (PairHMM) algorithm. Given the compute-intensiveness of this algorithm, general-purpose CPUs fail to deliver the required performance to analyze large-size genomic data. Thus, offloading this task to hardware accelerators like GPUs represents a suitable approach to speed up the execution. However, literature solutions lack the ability to support the alignment of long sequences, jeopardizing the solutions’ practical adoption. This work proposes a novel GPU-based accelerator for the PairHMM, featuring a dynamic memory swap methodology that allows analyzing sequences of any length while also improving performance up to 3× compared to the literature.
We also provide our system with a stream partitioning mechanism that enables seamless integration into HaplotypeCaller. Experimental results demonstrate our approach’s effectiveness, removing the State-of-the-Art limitations on sequence length and achieving more than 100 Giga Cell Updates Per Second on an NVIDIA A100. Besides, we present the first work that provides an end-to-end performance evaluation of HaplotypeCaller when relying on GPU acceleration, including Intel Genomic Kernel Library (GKL) in the comparison. Using an NVIDIA A100, our solution improves the tool’s execution time by 1.7× compared to GKL run on a State-of-the-Art Intel Xeon Platinum with the best thread configuration. Finally, we also propose a new workload to test our solution, closer to a real-life scenario compared to the previously-employed 10s benchmark, on which our design outperforms the baselines up to 31.34×.
Genome Analysis ToolKit HaplotypeCaller is the de-facto standard for highlighting genetic mutations. However, this tool requires long runtimes for real-life datasets, where most of it is spent computing the Pair Hidden Markov Model (PairHMM) algorithm. Given the compute-intensiveness of this algorithm, general-purpose CPUs fail to deliver the required performance to analyze large-size genomic data. Thus, offloading this task to hardware accelerators like GPUs represents a suitable approach to speed up the execution. However, literature solutions lack the ability to support the alignment of long sequences, jeopardizing the solutions’ practical adoption. This work proposes a novel GPU-based accelerator for the PairHMM, featuring a dynamic memory swap methodology that allows analyzing sequences of any length while also improving performance up to 3× compared to the literature.
We also provide our system with a stream partitioning mechanism that enables seamless integration into HaplotypeCaller. Experimental results demonstrate our approach’s effectiveness, removing the State-of-the-Art limitations on sequence length and achieving more than 100 Giga Cell Updates Per Second on an NVIDIA A100. Besides, we present the first work that provides an end-to-end performance evaluation of HaplotypeCaller when relying on GPU acceleration, including Intel Genomic Kernel Library (GKL) in the comparison. Using an NVIDIA A100, our solution improves the tool’s execution time by 1.7× compared to GKL run on a State-of-the-Art Intel Xeon Platinum with the best thread configuration. Finally, we also propose a new workload to test our solution, closer to a real-life scenario compared to the previously-employed 10s benchmark, on which our design outperforms the baselines up to 31.34×.
The NECSTLab is a DEIB laboratory, with different research lines on advanced topics in computing systems: from architectural characteristics, to hardware-software codesign methodologies, to security and dependability issues of complex system architectures.
Every week, the “NECSTFridayTalk” invites researchers, professionals or entrepreneurs to share their work experiences and projects they are implementing in the “Computing Systems”.
Event will hold on line by Facebook.