Grounding Intelligence: The Rise of Vision-Language-Action Models in Robotics

March 9^th, 2026 | 10:00 am
DEIB - Alpha Room (Bld. 24)

Contact: Riccardo Andrea Izzo

Abstract

On March 9^th, 2026 at 10:00 am Riccardo Andrea Izzo, PHD Student in Information Technology, will hold a seminar on "Grounding Intelligence: The Rise of Vision-Language-Action Models in Robotics" in DEIB Alpha Room (Building 24).

The integration of Generative AI into robotics marks a paradigm shift in how machines perceive, reason, and interact with the physical world. This seminar provides a comprehensive overview of this evolution, tracing the architectural trajectory from Large Language Models (LLMs) to Vision-Language Models (VLMs), and finally to Vision-Language-Action (VLA) models.
We will begin by contextualizing the role of GenAI in robotics, exploring how the grounding of language in visual data serves as the foundation for embodied intelligence. The core of the presentation will explore the characteristics of VLA models, analyzing the critical architectural choices required to bridge the gap between semantic understanding and motor control. We will discuss methods for encoding multimodal information, compare various learning paradigms, from imitation to reinforcement learning, and examine different strategies for action representation.
To conclude, the seminar will survey the state-of-the-art VLAs, highlighting the capabilities and innovations of recent models such as Gemini Robotics, π*0.6 and SmolVLA.

Abstract

The department

Research

Education

Industry

International relationships

Work with us

Contact us