LLMs in Incident Management Still Need Human SREs for Success

Home
/
Technology & Innovation
/
Apps & Software
/
LLMs in Incident Management Still Need Human SREs for Success

Sofia Rossi
October 9, 2025
Apps & Software

In the evolving landscape of technology, integrating LLMs in Incident Management is a topic gaining traction. A recent study conducted by ClickHouse reveals that while large language models (LLMs) show promise, they are not yet ready to fully replace Site Reliability Engineers (SREs) in incident management tasks. The findings suggest that although AI capabilities are advancing, they currently fall short when it comes to autonomously diagnosing production issues. This article explores the current limitations of LLMs in incident management, as well as the advantages they offer as supportive tools.

Understanding the Role of LLMs in Incident Management

The study led by researchers Lionel Palacin and Al Brown aimed to test the effectiveness of various LLMs by analyzing their performance against real-world observability data. Although LLMs like OpenAI’s models and Gemini show substantial potential, the results demonstrate a clear need for human oversight. The central question remains: can these models enhance incident management processes, and if so, how?

LLMs can assist in preliminary diagnostics.
They excel in drafting root cause analysis reports.

While LLMs have made strides, their limitations are evident. The study revealed that none of the models could consistently identify root causes without human intervention. As detailed in the evaluation, the approaches varied by model, demonstrating a need for collaborative efforts between human engineers and AI tools.

Challenges Faced by LLMs in Autonomous Diagnosis

The research conducted involved testing leading models using a straightforward prompt to identify issues based on access to observability data. Although some LLMs, including Claude Sonnet 4 and OpenAI GPT-o3, managed to identify specific problems, they struggled with more complicated scenarios requiring a nuanced understanding of the context.

For example, issues related to payment failures surfaced when users were classified under specific loyalty levels, and while certain models provided insights, the findings were not consistent. The general consensus among the researchers was that LLMs tend to follow a single line of reasoning without adequately exploring alternate solutions.

This inconsistency highlights several challenges faced by LLMs:

Propensity for inaccuracies, sometimes referred to as “hallucinations.”
Difficulty comprehensively analyzing complex problems without guidance.

The study also demonstrated stark differences in cost and efficiency among the tested models. Token usage varied dramatically, which complicates cost predictions. Investigation times ranged widely, leading to expenses anywhere from $0.10 to nearly $6 per incident. This variability underscores the unpredictable nature of LLM performance in real-world applications.

The Evolution of LLMs and Their Utility in Incident Management

While LLMs may not be ready to replace human engineers entirely, they provide immense potential to enhance incident management processes. Their ability to generate comprehensive root cause analysis reports allows teams to focus on strategic decision-making and more complex problem-solving tasks. This is akin to the insights found in our analysis of AI tools that explore automation integration, as seen in our article discussing automating alert triage.

The recommendation from the researchers is clear: integrating LLMs as assistive tools—rather than standalone solutions—enables engineers to retain control during incident investigations. With LLMs summarizing verbose logs and suggesting potential investigation directions, human experts can validate critical findings and refine the overall incident response strategy.

Real-World Applications of LLMs

Several studies underscore the successful utilization of LLMs in incident management, showing improvements in documentation and incident resolution time. Notably, routine tasks that are repetitive and automatable can be efficiently delegated to AI, allowing human engineers to concentrate on high-level oversight and complex challenges.

The intersection of LLM capabilities and human expertise outlines an approach reminiscent of the strategies discussed in understanding AI vulnerabilities. The integration of human oversight in AI-driven incident management ensures that the more intricate aspects of problem resolution are managed effectively.

Future Directions for LLMs in Incident Management

The ClickHouse study presents an opportunity for ongoing advancements in the application of LLMs within incident management frameworks. Researchers concluded that while present-day technology may not fully replace human engineers, future enhancements in context and tool capabilities will be crucial.

Similar findings emerged from a report by Tomasz Szandała, which evaluated models like GPT-4o and Gemini-1.5 in conducting root cause analysis. While LLMs achieved moderate success, human SREs consistently outperformed them, reinforcing the necessity of human involvement in decision-making processes. The study highlighted that prompt engineering can indeed boost model accuracy but still falls short of human intuition and judgment.

Through these findings, it is evident that the path forward involves structured prompting processes and a synergistic partnership between human experts and AI. This allows for efficient incident management while maintaining the critical human oversight necessary for strategic decision-making.

Conclusion: The Hybrid Approach to Incident Management

As organizations continue to explore innovative technologies, the role of LLMs in Incident Management must be viewed as complementary to human expertise. The clear recommendation from recent studies is to leverage LLMs to assist—and not replace—human engineers. A balanced approach using LLMs will not only improve response times but also enhance documentation quality, thereby strengthening overall efficiency in incident management processes.

To merge further these themes, exploring rapid developments in AI-driven tools can offer insights into this hybrid model of incident management, similar to the challenges outlined in our piece on enhancing AI with key features.

To deepen this topic, check our detailed analyses on Apps & Software section