We are all made of Big Data - Opportunities and Challenges in Modern Computational Genomics
Giulio Pavesi
Associate Professor, Department of Biosciences, University of Milano
Politecnico di Milano - Room L.26.16 (Via Golgi, 20 - Milano)
November 5th, 2019
10.30 am - 12.00 pm
Associate Professor, Department of Biosciences, University of Milano
Politecnico di Milano - Room L.26.16 (Via Golgi, 20 - Milano)
November 5th, 2019
10.30 am - 12.00 pm
Contacts:
Marco Masseroli
Research line:
Data, web and society
Sommario
Within the course “Bioinformatics and Computational Biology” and as part of the Master Degree in Bioinformatics for Computational Genomics, on Tuesday, November 5th, 2019 from 10.30 to 12.00 in room L.26.16, Giulio Pavesi, Associate Professor, Department of Biosciences, University of Milano, will give a seminar titled: “We are all made of Big Data – Opportunities and Challenges in Modern Computational Genomics“, whose summary follows here below.
A single living cell contains plenty of information, which it is essential for the cell to live and reproduce itself. In all living beings, information encoded by DNA (3.2 billion base pairs in human) is continuously read and interpreted by the cells that transcribes it into RNA, which in turn can be translated into functional proteins. The technological improvements of high-throughput analysis platforms employed to study and characterize the different molecules at the basis of life, first of all the next-generation sequencing (NGS), permit nowadays to acquire all the information contained in cells, and study how it is organized and decoded. We can sequence DNA at the single base pair level, and thus see the genome that the cell is reading. We can identify which genes are active in the cell in a given condition, and how their expression change across different conditions. We can investigate the different mechanisms that read and decode DNA into functional proteins. We can identify how errors in this process can lead to diseases. A conservative estimate of sequencing capacity only, based on the manufacturer specifications of the various instruments employed today, suggests that if used at full scale NGS platforms can generate more than 35 petabases of data per year. At the current rate, worldwide sequencing capacity could possibly reach zettabases of sequencing in the next 10 years, corresponding a number of complete human genomic sequences ranging from 100 million to 2 billion. By adding data produced in studies regarding gene expression and its regulation, as well as secondary datasets derived from the analysis of the raw data, the figures just mentioned can be increased by several orders of magnitude. It is thus evident how traditional lab-based sciences like genetics and molecular biology are rapidly turning into information sciences, where the catchword “big data” is now becoming more and more frequently used. And, also, how the possibility of studying in depth at the molecular level pathologies like cancer or other genetic diseases has opened new avenues for therapy, giving rise to “personalized medicine”. Researchers in the field today have thus to combine data science skills essential to organize, retrieve, and analyze the data with an in-depth knowledge of the molecular basis of life. This makes the “genome data scientist” a professional figure that will be central in quickly evolving modern biology.
Biografia
Giulio Pavesi graduated and received his PhD in Computer Science at the University of Milan. He is currently Associate Professor at the Department of Biosciences of the same University, where he leads the research group on Bioinformatics, Evolution and Comparative Genomics, and teaches Bioinformatics and Computational Biology at different degree courses in Biology and Biotechnology.
His research has been focused from the PhD studies on the development and application of algorithms and computational tools for the analysis of genomic data and the regulation of gene expression, including in the last few years analysis of next-generation sequencing based data (ChIP-Seq, RNA-Seq, RIP-Seq). He took part in several research projects and studies, covering both the development of bioinformatics tools and methods freely made available to the biological community and that can be considered to be at the state of the art in the respective fields of application, and results obtained from analysis of data produced in cooperation with several research groups more focused on wet biology.
His research has been focused from the PhD studies on the development and application of algorithms and computational tools for the analysis of genomic data and the regulation of gene expression, including in the last few years analysis of next-generation sequencing based data (ChIP-Seq, RNA-Seq, RIP-Seq). He took part in several research projects and studies, covering both the development of bioinformatics tools and methods freely made available to the biological community and that can be considered to be at the state of the art in the respective fields of application, and results obtained from analysis of data produced in cooperation with several research groups more focused on wet biology.