Data Science and Bioinformatics Lab
Responsible:
Web site:
http://datascience.deib.polimi.it
http://www.bioinformatics.deib.polimi.it/geco
Web site:
http://datascience.deib.polimi.it
http://www.bioinformatics.deib.polimi.it/geco
Activities
The Data Science and Bioinformatics Lab at Politecnico di Milano is concerned with the study of all aspects of data science and bioinformatics.
The laboratory is generally concerned in the study of data science theory, seen as a sound scientific discipline, and then in the development of methods, tools, technologies and applications for its effective deployment in real world problems, with special interest on problems related to genomic computing. The main scientific interests in data science are currently concerned with crowdsourcing, data extraction and scraping, streaming data management, social engagement, extraction of emerging knowledge from social content, user-centered data integration and exploration and machine learning / deep learning applications.
From a didactic point of view, the group is promoting Data-Shack, an innovative project for master’s students jointly managed with the Institute for Applied Computational Science (IACS) at Harvard’s John A. Paulson School of Engineering and Applied Sciences (SEAS). Additional initiatives include doctoral and summer courses.
The group is currently funded by many EU and private projects; in the past, the group’s main project has been Search Computing, an Advanced ERC Grant (2.5 million Euro, 2008-1013, http://searchcomputing.deib.polimi.it/). The main focus of the project was on developing languages and methods for data integration, guided by their ranking; current applications range from fashion management to smart cities, from social analytics to knowledge extraction using social sources.
The research activity in bioinformatics is concerned with genomic computing; the aim is constructing a powerful computational infrastructure that can process the data generated by the machines for DNA and RNA sequencing and allows creating easily visualizations, queries, analyses, mining and searches on genomic data collections distributed and available world-wide.
From a didactic point of view, the group is promoting a new joint master degree with Università Statale di Milano, on Bioinformatics and Computational Genomics.
The research is funded by Data-Driven Genomic Computing (GeCo), an Advanced ERC Grant (2.5 million Euro, 2016-2021), focused on management of big genomic data generated by NGS (Next Generation Sequencing) technology (http://www.bioinformatics.deib.polimi.it/geco/?home). The project aims at constructing a powerful computational infrastructure that can process the data generated by DNA and RNA sequencing and create visualizations, queries, analyses, mining and searches on genomic data collections distributed and available world-wide. The goal is to generate a standard computational infrastructure, highly efficient, extensible and easily usable – towards the “Internet of genomes” – to support scientists in the genomic research.
The project’s main outcome is GMQL (GenoMetric Query Language), a language and system for querying genomic big data, currently installed at Politecnico di Milano, CINECA and the Broad Institute, providing parallel computation in the cloud, thereby supporting queries over thousands of samples, such as the ones provided by the ENCODE and TCGA consortia (http://www.bioinformatics.deib.polimi.it/GMQLsystem/).
Other developed bioinformatics tools include:
GeMSE - GenoMectric Space Explorer,
MuSERA - Multiple Sample Enriched Region Assessment,
GPKB - Genomic and Proteomic Knowledge Base
and Bio-SeCo - Bio Search Computing.
The laboratory is generally concerned in the study of data science theory, seen as a sound scientific discipline, and then in the development of methods, tools, technologies and applications for its effective deployment in real world problems, with special interest on problems related to genomic computing. The main scientific interests in data science are currently concerned with crowdsourcing, data extraction and scraping, streaming data management, social engagement, extraction of emerging knowledge from social content, user-centered data integration and exploration and machine learning / deep learning applications.
From a didactic point of view, the group is promoting Data-Shack, an innovative project for master’s students jointly managed with the Institute for Applied Computational Science (IACS) at Harvard’s John A. Paulson School of Engineering and Applied Sciences (SEAS). Additional initiatives include doctoral and summer courses.
The group is currently funded by many EU and private projects; in the past, the group’s main project has been Search Computing, an Advanced ERC Grant (2.5 million Euro, 2008-1013, http://searchcomputing.deib.polimi.it/). The main focus of the project was on developing languages and methods for data integration, guided by their ranking; current applications range from fashion management to smart cities, from social analytics to knowledge extraction using social sources.
The research activity in bioinformatics is concerned with genomic computing; the aim is constructing a powerful computational infrastructure that can process the data generated by the machines for DNA and RNA sequencing and allows creating easily visualizations, queries, analyses, mining and searches on genomic data collections distributed and available world-wide.
From a didactic point of view, the group is promoting a new joint master degree with Università Statale di Milano, on Bioinformatics and Computational Genomics.
The research is funded by Data-Driven Genomic Computing (GeCo), an Advanced ERC Grant (2.5 million Euro, 2016-2021), focused on management of big genomic data generated by NGS (Next Generation Sequencing) technology (http://www.bioinformatics.deib.polimi.it/geco/?home). The project aims at constructing a powerful computational infrastructure that can process the data generated by DNA and RNA sequencing and create visualizations, queries, analyses, mining and searches on genomic data collections distributed and available world-wide. The goal is to generate a standard computational infrastructure, highly efficient, extensible and easily usable – towards the “Internet of genomes” – to support scientists in the genomic research.
The project’s main outcome is GMQL (GenoMetric Query Language), a language and system for querying genomic big data, currently installed at Politecnico di Milano, CINECA and the Broad Institute, providing parallel computation in the cloud, thereby supporting queries over thousands of samples, such as the ones provided by the ENCODE and TCGA consortia (http://www.bioinformatics.deib.polimi.it/GMQLsystem/).
Other developed bioinformatics tools include:
GeMSE - GenoMectric Space Explorer,
MuSERA - Multiple Sample Enriched Region Assessment,
GPKB - Genomic and Proteomic Knowledge Base
and Bio-SeCo - Bio Search Computing.
Service information
The laboratory is located at the DEIB (Department of Electronics, Information and Bioengineering) headquarters, building 20, and is subject to the same opening and closing times.