
Speaker: Marcello Martini
April 24th, 2026 | 11:30 am
DEIB - NECSTLab Meeting Room (Bld. 20)
Online by Zoom
Contact: Prof. Marco Santambrogio
Abstract
On Friday, April 24th, 2026, we will have a new talk for the series #NECSTFridayTalk.During this talk, we will have, as speaker, Marcello Martini, MSc graduate from Politecnico di Milano.
As modern cloud-native applications increasingly rely on distributed microservices, the complexity of system observability has grown dramatically, making incident response a significant bottleneck for Site Reliability Engineering (SRE) teams. Although Large Language Models (LLMs) are increasingly explored for Root Cause Analysis (RCA), deploying them in production environments introduces critical challenges: high token costs, context saturation, and hallucinations.
This work presents a novel Multi-Agent System (MAS) designed for automated RCA in Kubernetes environments. The architecture adopts a divide-and-conquer strategy, decomposing complex diagnostic tasks into manageable sub-problems handled by specialized agents. Key contributions include a hybrid deterministic-generative Triage agent, a Datagraph to ground reasoning in the cluster topology, and a custom Model Context Protocol (MCP) server that distills telemetry data to optimize token efficiency and reduce hallucinations.
Experimental validation across 17 fault scenarios shows that this approach, using the cost-effective GPT-5-mini model, outperforms state-of-the-art baselines running on GPT-4, achieving a Localization Accuracy of 82% versus 61.54%, a relative improvement of over 33%. Further analysis reveals that Planner-guided tool selection reduces token usage by up to 42% and execution time by up to 29%. These findings suggest that architectural guardrails and task decomposition can outweigh raw model scale for RCA, enabling strong performance at lower cost.
The NECSTLab is a DEIB laboratory, with different research lines on advanced topics in computing systems: from architectural characteristics, to hardware-software codesign methodologies, to security and dependability issues of complex system architectures.
#NECSTLab #Computerscience
Every week, the “NECSTFridayTalk” invites researchers, professionals or entrepreneurs to share their work experiences and projects they are implementing in the “Computing Systems”.
