Giulia Guidi
Graduate student in Biomedical Engineering - Politecnico di Milano
DEIB - NECST Meeting Room (Edificio 20, piano seminterrato)
October 27th, 2017
12.00 pm
Research line:
System architecture
The knowledge of the complete DNA makeup of an organism is crucial to provide the most detailed resolution of genetic and epigenetic variations. Nowadays, third generation sequencers provide long and redundant substrings from the DNA, called reads, with average length of 10 kb, allowing the alignment of unambiguous sequence and providing high quality de novo genome assemblies. However, long reads still have high error rates, up to 15% considering Pacific Biosciences technology, which make hard the assembly of whole high quality genome.
In this work we propose a novel approach to handle erroneous data in the initial step of the assembly pipeline, which consists in finding overlapping reads. The approach is based on shared short substrings belonging to the reads, named k-mers. To efficiently discover the overlaps, we exploit sparse matrix multiplication, achieving true positive rates greater than 90% for several genomes.
Every week, the "NECST Friday Talk" invites researchers, professionals or entrepreneurs to share their work experiences and projects they are implementing in the "Computing Systems".