Operations 26 min read

Boosting Oncall Interception from 15% to 55%: KOncall’s AI‑Driven Evolution at Kuaishou

Kuaishou’s R&D efficiency team built the KOncall intelligent on‑call platform, integrating LLM‑based retrieval‑augmented generation, Redis Pub/Sub streaming, OCR multimodal parsing, FAQ knowledge ops, and custom reranking, which raised automated query interception from 15% to 55% and processed over 116 000 requests, turning on‑call from a bottleneck into a capability starter.

Kuaishou Tech

Apr 29, 2026

Boosting Oncall Interception from 15% to 55%: KOncall’s AI‑Driven Evolution at Kuaishou

In large‑scale software development, on‑call duty is essential for service stability but often becomes a productivity black hole due to fragmented channels and low‑intelligence interception. Kuaishou’s R&D efficiency team created KOncall , an AI‑native on‑call system that has increased the automated interception rate from 15% to 55% and handled more than 116 000 queries.

1. Problem Background

Internal surveys showed that engineers spend 5%–15% of their time on on‑call tasks, incurring hidden costs. Two core challenges were identified:

Channel fragmentation – different products used private chats, document platforms, or ad‑hoc tools, forcing engineers to hunt for the right contact.

Lack of smart interception – keyword‑based routing could only handle ~15% of cases, leaving most queries to human agents.

2. Architecture Evolution

2.1 LLM + RAG foundation – The team replaced the early NLP pipeline with a large language model (LLM) backed by retrieval‑augmented generation (RAG). Knowledge assets from Docs, Techlink, and historic on‑call chats were indexed and served via a Redis pub/sub streaming layer, enabling low‑latency, multi‑client reconnection.

Benchmarking showed Redis pub/sub latency under 1 ms versus 10–20 ms for Kafka, and the system could sustain the inference latency of the LLM with only four threads.

2.2 Knowledge Operations

The team built a closed‑loop knowledge pipeline:

Extract FAQs from historic on‑call group chats and documents.

Store them as QA pairs; apply a similarity threshold (0.9) to avoid duplication.

Human reviewers confirm or edit entries before they are added to the knowledge base.

FAQ coverage grew from 5% to 67%, and the interception rate after this stage rose from 15% to ~25%.

2.3 Core QA Improvements

2.3.1 OCR multimodal parsing – An OCR step extracts text from screenshots (error codes, stack traces) and appends the result to the user query with the phrase “simultaneously the user provided an image, OCR result is: …”. This dramatically improved LLM understanding of image‑only questions.

2.3.2 Intent‑aware FAQ matching – Three‑stage matching was introduced:

Keyword search (ES) – proved too brittle.

Embedding‑based semantic search (bge‑m3) with a 0.8 similarity threshold.

Rerank using a cross‑encoder (bge‑m3‑rerank) to eliminate false positives such as “restart instance” vs. “rebuild instance”.

After fine‑tuning on 3 672 manually verified pairs, the reranker reduced FAQ recommendation rate (from 33.5% to 21.9%) while increasing post‑click interception from 50.4% to 66.4%.

2.4 Dynamic Knowledge Integration

For scenarios requiring real‑time data (e.g., container‑cloud deployment failures), a dynamic tool chain was added. An agent automatically injects context such as pod name, IP, and environment tags, calls monitoring APIs, and merges the results with static knowledge before the LLM generates the final answer. In a container‑cloud failure diagnosis case, the interception rate reached 99%.

2.5 Reducing Repetitive Issues

A problem‑clustering pipeline groups on‑call tickets by embedding similarity (threshold 0.8), summarizes each cluster with the LLM, and pushes high‑frequency pain points to product teams for source‑level fixes. This “left‑shift” approach turned on‑call data into a product‑improvement signal.

3. Outlook

The team envisions three future directions:

Experience as system capability – automatically extract expert troubleshooting steps into reusable “Oncall Skills”.

Source‑level problem elimination – use clustering insights to drive product redesign and reduce on‑call volume.

Knowledge retention – build team‑specific knowledge bases that new engineers can query, preventing loss of tacit expertise.

Overall, KOncall demonstrates how AI‑augmented operations can transform on‑call from a cost center into a capability starter, freeing engineers to focus on value‑creating work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM RAG redis OCR incident management knowledge management AI Operations Oncall

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.