How LLMs Can Automate Ticket Escalation: Inside ByteBrain’s TickIt System

This article introduces TickIt, a ByteBrain system that leverages large language models to automatically identify and escalate critical Oncall tickets, detailing its multi‑class escalation, deduplication, and category‑guided fine‑tuning modules, experimental results, and the operational impact on cloud services.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How LLMs Can Automate Ticket Escalation: Inside ByteBrain’s TickIt System

Background

In the era of rapid cloud computing growth, Oncall tickets serve as a crucial bridge between customers and technical support/SRE teams at Volcano Engine, generating thousands of tickets daily that contain diverse issues such as usage queries, feature requests, and system failures.

Traditional manual escalation relies heavily on the on‑call engineer's experience, leading to inconsistent standards and missed critical issues, which can degrade stability and customer satisfaction.

Challenges

Oncall tickets vary widely in type and severity, requiring different expertise for resolution. Existing feature‑engineered classifiers struggle with semantic understanding, cannot handle dynamic clarification during conversations, and fail to detect escalation opportunities in real time.

Identifying relationships among tickets is also important; similar tickets from multiple customers should be grouped to streamline handling.

TickIt Overview

TickIt employs ByteDance’s Doubao LLM to dynamically track and understand Oncall dialogues, enabling timely escalation of severe issues and discovery of semantic links between tickets.

Multi‑class Escalation

Tickets are treated as a multi‑class classification problem with predefined categories such as system failure, customer complaint, and asset loss; all other tickets fall into an "Other" category. System prompts incorporate role assignment, chain‑of‑thought reasoning, and few‑shot examples to improve classification accuracy and interpretability.

Escalation Deduplication

When a ticket is flagged for escalation, TickIt checks pending tickets for similarity using Doubao‑embedding vectors and cosine similarity with a threshold (θ=0.88). Similar tickets are linked, avoiding duplicate alerts, and the model rewrites problem descriptions to capture common features.

Category‑guided Fine‑tuning

After escalation, TickIt sends a notification card with like/dislike buttons and a link to the original chat. Interaction data serve as feedback for supervised fine‑tuning (SFT), where each training example contains the original dialogue, the LLM's reasoning, and the final category.

Experimental Validation

Deployed online at Volcano Engine, TickIt processed tens of thousands of Oncall tickets, achieving an 81% accuracy rate in escalation decisions. Comparative experiments showed LLM‑based methods reaching ~82% precision/recall, with in‑context learning boosting recall to 89.2%.

Supervised fine‑tuning further improved performance: after SFT, recall rose to 91.2% with precision around 81.8% and an F1 score of 86.2%, outperforming other prompt‑based approaches.

Conclusion and Limitations

TickIt significantly reduces mean time to recovery (MTTR) by about 26% and cuts labor costs, but it can be misled by exaggerated or understated issue descriptions and by ambiguous service contexts, leading to occasional false escalations.

Future work will focus on refining prompt designs and expanding domain‑specific knowledge to mitigate these limitations.

TickIt architecture diagram
TickIt architecture diagram
LLMincident managementSupervised Fine‑Tuningcloud operationsOncall analysisticket escalation
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.