Backend Architecture and Performance Optimization of an AI Interview Robot
This article details the backend architecture, dialogue engine design, resource estimation, and performance optimization techniques of an AI interview robot used in 58.com’s recruitment platform, illustrating how multi‑branch dialogue flows, RTP communication, session management, and monitoring enable scalable, stable, and efficient online interview services.
Introduction
The AI interview robot leverages the Lingxi intelligent voice‑semantic platform to simulate multi‑round voice interactions between recruiters and candidates, achieving fully online interviews. Since its launch, it has handled millions of interview requests, significantly improving recruitment efficiency and candidate experience.
Project Background
58.com’s lifestyle service platform connects a massive number of C‑end users with B‑end merchants. The surge in online interview requests during the 2020 pandemic highlighted the need for an intelligent interview tool. The AI interview robot was built to address low link success rates caused by one‑to‑one video interview constraints, enabling recruiters to handle many candidates simultaneously and allowing candidates to interview anytime, anywhere.
Backend Architecture
The system consists of four layers:
Access Layer : Handles communication protocols with upstream/downstream components, extracts user profiles, and distributes interview timeline information.
Logic Layer : Manages dialogue interaction, converting robot prompts to speech via TTS, performing VAD‑based sentence segmentation, streaming speech‑to‑text recognition, and generating robot responses based on a directed‑acyclic dialogue graph.
Data Layer : Stores dialogue graphs, conversation records, and annotation data.
Web System : Provides visual configuration of dialogue structures, strategies, and annotation data.
Interaction Flow
The interview process is divided into three stages: pre‑interview, interview, and post‑interview. Pre‑interview establishes UDP communication, dynamically allocates IP and ports via the SCF RPC framework, and registers a session. During the interview, the robot sends TTS‑generated prompts, receives user speech, performs VAD and streaming ASR, and replies according to the dialogue graph. Post‑interview releases allocated resources, builds a candidate profile, and stores the conversation.
Dialogue Engine Core Functions
The engine drives conversation using a multi‑branch dialogue graph. Initially a two‑branch graph yielded a 20% completion rate. After redesigning to a multi‑branch structure with node‑level strategy chains, completion rose above 50%.
Data Layer Design
New data entities such as dialogue tables, nodes, and edges were introduced. Nodes bind to dialogue IDs and store text, while edges define topology, regex matching, and corpus rules. Strategy chains are linked to nodes for fine‑grained control.
Code Layer Design
Classes representing edges, nodes, dialogues, and dialogue states were abstracted. The dialogue graph is implemented as an adjacency list, enabling the engine to traverse from the current node to appropriate edges based on user input.
Performance Optimization Practices
To support thousands of concurrent online interviews, four areas were optimized:
Resource Management : Sessions encapsulate temporary resources (ports, threads, codecs). Resources are released when a session ends, and a monitoring thread reclaims stale sessions.
Resource Estimation : Limits on temporary resources, network bandwidth, disk cache, and thread pool metrics were defined and monitored.
Performance Testing : Stress tests at 2500 requests/min identified heap memory as a bottleneck. Memory usage was reduced by shrinking audio buffers, leading to stable garbage collection.
Fine‑Grained Monitoring : Key metrics (service KPIs, resource KPIs, flow KPIs, thread‑pool KPIs, ASR/VAD KPIs) were tracked to ensure stability.
Conclusion
The article presented the AI interview robot’s backend architecture, full interaction flow, dialogue engine core, and performance optimization methods. Ongoing work will continue to iterate on features and further improve scalability for broader business adoption.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.