DEIB - Eventi

From supervised learning to causal inference in large dimensional settings

Gianluca Bontempi
Universitè Libre de Bruxelles, Belgium

Politecnico di Milano - DEIB
this event will be online via Microsoft Teams
June 26th, 2020
9.00 am - 1.00 pm

Contacts:
Giacomo Boracchi
Cesare Alippi
Matteo Matteucci

Sommario

On June 26th, 2020 from 9.00 am to 1.30 pm, the “From supervised learning to causal inference in large dimensional settings” seminar will take place online, within the PhD Course on Machine Learning for Non-Matrix Data, organized by profs. Giacomo Boracchi, Cesare Alippi, Matteo Matteucci.

We are drowning in data and starving for knowledge” is an old adage of data scientists that nowadays should be rephrased into ”we are drowning in associations and starving for causality”. The democratization of machine learning software and big data platforms is increasing the risk of ascribing causal meaning to simple and sometimes brittle associations. This risk is particularly evident in settings (like bioinformatics, social sciences, economics) characterised by high dimension, multivariate interactions, dynamic behaviour where direct manipulation is not only unethical but also impractical. The conventional ways to recover a causal structure from observational data are score-based and constraint-based algorithms. Their limitations, mainly in high dimension, opened the way to alternative learning algorithms which pose the problem of causal inference as the classification of probability distributions. The rationale of those algorithms is that the existence of a causal relationship induces a constraint on the observational multivariate distribution. In other words, causality leaves footprints in the data distribution that can be hopefully used to reduce the uncertainty about the causal structure. This first part of the presentation will introduce some basics of causal inference and will discuss the state-of-the-art on machine learning for causality (notably causal feature selection) and some application to bioinformatics. The second part of the talk will focus on the D2C approach which featurizes observed data by means of information theory asymmetric measures to extract meaningful hints about the causal structure. The D2C algorithm performs three steps to predict the existence of a directed causal link between two variables in a multivariate setting: (i) it estimates the Markov Blankets of the two variables of interest and ranks its components in terms of their causal nature, (ii) it computes a number of asymmetric descriptors and (iii) it learns a classifier (e.g. a Random Forest) returning the probability of a causal link given the descriptors value. The final part of the presentation is more prospective and will introduce some recent work to implement counterfactual prediction in a data driven setting.

This event will be online and organized via Microsoft Teams.

In order to attend the seminar, please, go to the following link: Join Microsoft Teams Meeting.