
The team from Politecnico di Milano has secured first place in the prestigious HD-EPIC VQA Challenge, held as part of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which took place in Nashville, Tennessee, from June 11 to 15, 2025.
The competition challenged participants to develop advanced Video Question Answering (VQA) systems, based on egocentric videos recorded using wearable devices. The team from Milan proposed an innovative two-stage approach:
- Symbolic representation of the video using a semantic graph, allowing for a clear and compact structuring of key information.
- Automated reasoning using Large Language Models (LLMs) applied to the graph to accurately answer questions.
This method demonstrated the effectiveness of symbolic representation in enhancing the interpretability and accuracy of the system, supporting a more efficient and robust understanding of video content.
The winning project originated as the Master’s thesis of Agnese Taluzzi and Davide Gesualdi, students in Computer Engineering, and was developed within the Smart Eyewear Lab, a joint research initiative between Politecnico di Milano and EssilorLuxottica, with the collaboration of AIRLab.
In addition to Taluzzi and Gesualdi, the winning team included Riccardo Santambrogio (PhD student in Information Engineering), Chiara Plizzari (postdoctoral researcher), Simone Mentasti (researcher), and Francesca Palermo (researcher at EssilorLuxottica), under the supervision of Prof. Matteo Matteucci from the Department of Electronics, Information and Bioengineering.