Data-Driven Genomic Computing: Making Sense of the Signals from the Genome
DEIB - Conference Room
May 18th, 2016
4.30 pm - 6.00 pm
Contact:
Stefano Ceri
Research Line:
Data, web, and society
May 18th, 2016
4.30 pm - 6.00 pm
Contact:
Stefano Ceri
Research Line:
Data, web, and society
Sommario
On May 18th at 16.30, in the DEIB Conference room, Stefano Ceri will give a talk about the research topic that led him to win an ERC Advanced Grant.
Stefano Ceri is one of the two computer scientists in Europe who reached this milestone twice: he won his first ERC Advanced Grant in 2008 with the SeCo project (Search Computing) and now his second ERC Advanced Grant with the GeCo Project, "Data-Driven Genomic Computing".
Below, the abstract of his seminar.
Genomic computing is a new science focused on understanding the functioning of the genome, as a premise to fundamental discoveries in biology and medicine. Next Generation Sequencing (NGS) allows the production of the entire human genome sequence at a cost of about 1000 US $; many algorithms exist for the extraction of genome features, or "signals", including peaks (enriched regions), mutations, or gene expression (intensity of transcription activity). The missing gap is a system supporting data integration and exploration, giving a “biological meaning” to all the available information; such a system can be used, e.g., for better understanding cancer or how environment influences cancer development.
The GeCo Project (Data-Driven Genomic Computing, ERC Advanced Grant currently undergoing the contract preparation) has the objective or revisiting genomic computing through the lens of basic data management, through models, languages, and instruments; the research group of DEIB is among the few which are centering their focus on genomic data integration. Starting from an abstract model, we already developed a system that can be used to query processed data produced by several large Genomic Consortia, including Encode and TCGA; the system employs internally the Spark, Flink, and SciDB data engines, and prototypes can already be accessed from Cineca servers or be downloaded from PoliMi servers. During the five-years of the ERC project, the system will be enriched with data analysis tools and environments and will be made increasingly efficient.
Most diseases have a genetic component, hence a system which is capable of integrating “big data” of genomics is of paramount importance. Among the objectives of the project, the creation of an “open source” system available to biological and clinical research; while the GeCo project will provide public services which only use public data (anonymized and made available for secondary use, i.e., knowledge discovery), the use of the GeCo system within protected clinical contexts will enable personalized medicine, i.e. the adaptation of therapies to specific genetic features of patients. The most ambitious objective is the development, during the 5-years ERC project, of an “Internet for Genomics”, i.e. a protocol for collecting data from Consortia and individual researchers, and a “Google for Genomics”, supporting indexing and search over huge collections of genomic datasets.
Stefano Ceri is one of the two computer scientists in Europe who reached this milestone twice: he won his first ERC Advanced Grant in 2008 with the SeCo project (Search Computing) and now his second ERC Advanced Grant with the GeCo Project, "Data-Driven Genomic Computing".
Below, the abstract of his seminar.
Genomic computing is a new science focused on understanding the functioning of the genome, as a premise to fundamental discoveries in biology and medicine. Next Generation Sequencing (NGS) allows the production of the entire human genome sequence at a cost of about 1000 US $; many algorithms exist for the extraction of genome features, or "signals", including peaks (enriched regions), mutations, or gene expression (intensity of transcription activity). The missing gap is a system supporting data integration and exploration, giving a “biological meaning” to all the available information; such a system can be used, e.g., for better understanding cancer or how environment influences cancer development.
The GeCo Project (Data-Driven Genomic Computing, ERC Advanced Grant currently undergoing the contract preparation) has the objective or revisiting genomic computing through the lens of basic data management, through models, languages, and instruments; the research group of DEIB is among the few which are centering their focus on genomic data integration. Starting from an abstract model, we already developed a system that can be used to query processed data produced by several large Genomic Consortia, including Encode and TCGA; the system employs internally the Spark, Flink, and SciDB data engines, and prototypes can already be accessed from Cineca servers or be downloaded from PoliMi servers. During the five-years of the ERC project, the system will be enriched with data analysis tools and environments and will be made increasingly efficient.
Most diseases have a genetic component, hence a system which is capable of integrating “big data” of genomics is of paramount importance. Among the objectives of the project, the creation of an “open source” system available to biological and clinical research; while the GeCo project will provide public services which only use public data (anonymized and made available for secondary use, i.e., knowledge discovery), the use of the GeCo system within protected clinical contexts will enable personalized medicine, i.e. the adaptation of therapies to specific genetic features of patients. The most ambitious objective is the development, during the 5-years ERC project, of an “Internet for Genomics”, i.e. a protocol for collecting data from Consortia and individual researchers, and a “Google for Genomics”, supporting indexing and search over huge collections of genomic datasets.