NECSTFridayTalk – Beyond AI in the Wild: Optimizing Root Cause Analysis with Multi-Agent Systems
Eventi

NECSTFridayTalk – Beyond AI in the Wild: Optimizing Root Cause Analysis with Multi-Agent Systems

24 APRILE 2026

Immagine di presentazione 1

Speaker:  Marcello Martini

24 Aprile 2026 | 11:30
DEIB - NECSTLab Meeting Room (Ed. 20)
Online by Zoom

Contatti: Prof. Marco Santambrogio

Sommario

On Friday, April 24th, 2026, we will have a new talk for the series #NECSTFridayTalk.

During this talk, we will have, as speaker, Marcello Martini, MSc graduate from Politecnico di Milano.

As modern cloud-native applications increasingly rely on distributed microservices, the complexity of system observability has grown dramatically, making incident response a significant bottleneck for Site Reliability Engineering (SRE) teams. Although Large Language Models (LLMs) are increasingly explored for Root Cause Analysis (RCA), deploying them in production environments introduces critical challenges: high token costs, context saturation, and hallucinations.
This work presents a novel Multi-Agent System (MAS) designed for automated RCA in Kubernetes environments. The architecture adopts a divide-and-conquer strategy, decomposing complex diagnostic tasks into manageable sub-problems handled by specialized agents. Key contributions include a hybrid deterministic-generative Triage agent, a Datagraph to ground reasoning in the cluster topology, and a custom Model Context Protocol (MCP) server that distills telemetry data to optimize token efficiency and reduce hallucinations.
Experimental validation across 17 fault scenarios shows that this approach, using the cost-effective GPT-5-mini model, outperforms state-of-the-art baselines running on GPT-4, achieving a Localization Accuracy of 82% versus 61.54%, a relative improvement of over 33%. Further analysis reveals that Planner-guided tool selection reduces token usage by up to 42% and execution time by up to 29%. These findings suggest that architectural guardrails and task decomposition can outweigh raw model scale for RCA, enabling strong performance at lower cost.

The NECSTLab is a DEIB laboratory, with different research lines on advanced topics in computing systems: from architectural characteristics, to hardware-software codesign methodologies, to security and dependability issues of complex system architectures. 
#NECSTLab #Computerscience

Every week, the “NECSTFridayTalk” invites researchers, professionals or entrepreneurs to share their work experiences and projects they are implementing in the “Computing Systems”.