The project Discount Quality for Responsible Data Science: Human-in-the-Loop for Quality Data, coordinated by Prof. Barbara Pernici from the Department of Electronics, Information and Bioengineering – Politecnico di Milano and carried out in collaboration with the University of Modena and Reggio Emilia, the University of Milano-Bicocca and the University of Rome "La Sapienza," officially kicked off on November 29, 2023.
The project, funded by PRIN, has been designed in a scenario characterized by several attempts to build data 'spaces' or 'ecosystems' which support the publication and reuse of scientific data for feeding pipelines, i.e. the processes that data scientists specify and execute to prepare, transform, enrich and analyse data. However, assessing and controlling the quality of data and results can be very expansive in terms of computational resources and human costs, since completely automated pipelines present significant weaknesses in monitoring the data life cycle and often make it very difficult to control the results in terms of quality, uncertainty and explainability.
That’s why the project intends to exploit a Human-In-The-Loop approach – i.e. an approach involving human intervention in the most delicate phases of the data transformation process – to increase the overall sustainability of the pipeline, both from a computational point of view and in terms of human effort. In particular, the project focuses on data preparation, which normally takes up to 80% of the overall time needed to complete the process, balancing the need for high quality data and the need to reduce the work involved in preparing them.