DEIB - Research Lines

Data, web, and society

The image shows a modern server room or data center, with many racks filled with servers illuminated by blue and green lights.

Research Area:

Web site:
http://poseidon.ws.dei.polimi.it/ca/
http://perlawsn.sourceforge.net/index.php
https://www.hoc.polimi.it/
http://www.bioinformatics.deib.polimi.it/genomic_computing/

Focus

The research in this area addresses technologies, design methods, and tools for data management systems, information management and querying on the Web, and multimedia and multichannel communication. The attitude of the group is to embody research results into innovative applications and prototype demonstrators.

The research mainly addresses the following lines:

Big Data and Data Science: big data is an emerging phenomenon of our society, with enormous economic, social and cultural implications. In this context, the challenge is collecting, organizing, analyzing, searching, sharing and displaying data in multiple application domains. The term "data science" refers to science and methodologies for extracting information from big data for diagnostic and predictive purposes. The ongoing research exploits the integration of data from heterogeneous data sources (for example, electricity and water consumption, telephone communications, journalistic information, exchange of information on the web and on social media) to understand and better organize cities (smart cities) and their events (for example, in the city of Milan, the design week and the fashion week) or to better exploit their potential (for example, in the city of Como, tourism), or predict interesting events. The group also works on the Exploratory Computing paradigm, to support in the analysis of data the users who do not have technical or statistical background, highlighting the noteworthy aspects of a data set too large and complex to be fully read. Since data are often used within critical decision processes (e.g., staff evaluation, college admission, criminal sentencing), ethics-aware data processing has become a pressing need. The group proposes a vision for the enforcement of ethical principles like fairness, non-discrimination, transparency, data protection, and diversity, into the data analysis lifecycle.

Genomic Computing: supported by a new ERC-funded project, this research is centered on the management of genomic big data. From a technological point of view, research is aimed at creating open, cloud-based systems for querying and managing heterogeneous and distributed genomic data. It becomes possible to integrate the different types of signals in the human genome (mutations, levels of gene expression, expression peaks) to understand complex biological phenomena, with a particular attention to the study of tumors (cancer genomics).

Human Computation and Social Analytics: the so-called "wisdom of the crowd" is exploited to execute difficult tasks, not easily delegated to automatic methods, such as the discovery of a new knowledge starting from social signals, analysis and tagging of multimedia contents, with approaches that mix the interaction of social networks, social data mining, and the design of games with a purpose.

Information management in pervasive systems: these systems, where information processing is often embedded into everyday objects and daily activities, require a special treatment of data: a special query language, the ability to adapt to the current context (context-awareness), the analysis and the execution of flexible and semantic-based queries enable the fusion of heterogeneous data coming diverse sources such as mobile devices, sensors and Web APIs.

Stream Reasoning: stream-based query languages are studied in the context of the RDF framework, addressing issues such as the formal semantics of stream query languages, query optimization, and the optimal materialization of data inferred by the reasoners.

Multimedia and multichannel communication: definition of new paradigms allowing, across media and devices, efficient and effective communication, customized to specific usage contexts and user profiles.

eLearning: designing and evaluating new forms of teaching and learning that exploit the potential of interactive digital technologies.

Unconventional human-machine interfaces: modeling, development and evaluation of interaction paradigms based on the interpretation of gestures, movements, voice, facial expressions, gaze, on the manipulation of digitally enriched physical objects ("smart objects"), and on the use of multimedia content presented on large screens, wearable devices, or environmental projections.

Big data for the environment and sustainability: it designs, implements and assesses systems that apply big data analytics to environmental and sustainability issues, such as the analysis of the conservation status of the snow cover in order to predict the available water quantity, the monitoring of energy and water consumption for the identification of consumer behavior patterns and the prediction of the demand.

Big data and smart society: research at DEIB exploits digital technologies to collect data from the environment and from people in order to improve their quality of life, for example, through: the analysis of the mobility of users with fragility to facilitate accessibility and pedestrian walkability; the monitoring of the elderly and of the persons with mild cognitive disability in domestic environments to foster their autonomy; the identification of daily activities, deviations from the usual behavior or behavioral drifts with the aim of guaranteeing the well-being of both the person and the caregiver.

Big data and process mining: the research aims at analyzing huge quantities of data collected in different scenarioes, such as companies or healthcare. By this analysis, formal models of processes and of activities, which generated those data, can be identified. After the process model has been identified, further analysis can be performed, e.g. to check its correctness and its compliance to rules, recommendations, and best practices of the specific application domain.

Most relevant research achievements

Situational and context-aware knowledge access:
ContextADDICT is a methodological framework for integrating, tailoring and delivering context-aware information.

Flexible and semantic-based structured and semistructured data analysis and querying:
(1) NYAYA is a tool for the management of Semantic-Web data that couples a general-purpose storage mechanism with efficient ontology reasoning and querying capabilities.
(2) The TreeRuler prototype enables the extraction of intensional, approximate information on the structure and content of relational databases or XML documents.

Pervasive Language Definition and Development:
PerLa is an SQL-like language for the interaction with a pervasive system as if it were a database. It allows the user to interact with logical objects that wrap physical devices, which can become part of the system at run-time with a “Plug and Play” behaviour.

Stream Reasoning:
The BOTTARI application, winner of the Semantic Web Challenge 2011, exploits social media and context to provide recommendations to user in a specific geographic location.

Genomic Data Management:
GFINDer is a system for discovering, using, and mining a large amount of genomic information from heterogeneous and distributed online databases for supporting the biomedical interpretation of high-throughput biomolecular experiments.

Infrastructure for multi-domain queries:
the SeCo query engine supports queries expressed in a declarative language over service interfaces. Queries are translated into acyclic invocation workflows, and then into physical execution plans interpreted by the query engine.

Answering queries with ranking:
Traditional rank join algorithms have shortcomings in solving the proximity rank join problem, as they may read more input than needed. A tight bound is therefore defined to guarantee that an I/O cost always is within a constant factor of optimal.

Human computation architecture and tools:
the CUbRIK project has defined a process-based architecture for defining workflows of automatic and human tasks, capable of exporting activities for execution by large pools of humans in social networks, crowdsourcing platforms and games with a purpose.

Web Mashups models and tools:
New models, methods, and tools have been defined for the composition of Web mashups. We have proposed visual composition paradigms adequate for the end-users, new component and composition models, and Web-based composition environments equipped with model-to-code generative techniques for the transformation of the high-level visual abstractions into execution models addressing multiple devices.

Multimedia, multichannel communication:
an innovative toolkit (1001stories) has been created allowing the authoring and delivery of multimedia content over various channels/devices, used in more than 40 professional applications and by more than 20,000 students of all ages at school.

eLearning:
innovative formats for supporting learning through technologies and innovative learning experiences have been delivered in 18 different European countries plus Israel and the USA, to more than 9,000 students; playful and tangible interaction have been investigated to understand how the promote learning, in particularly in children with special needs.

Haptic interaction:
high-level haptic interaction models and tools have been developed and applied in a visual-haptic interaction to support chemistry learning.

Exploratory information access:
a new exploratory paradigm, based on highly reactive interfaces, for handling and “sense making” of huge data, has been developed and applied in various contexts (eLearning, eCulture…). More than 100 of real-life actors have been involved.

Social Business Process Management:
Social BPM is about designing and executing processes cooperatively. We explore the organizational and technological implications of Social BPM and propose a model-driven approach to the design of social business processes, by extending the BPMN standard to incorporate social interactions into process models and generate application code from models.