How an AI Interview Bot Scaled 20× Faster with Backend Architecture Optimizations
This article details the design of an AI interview robot for 58.com, covering its backend architecture, dialogue engine, resource‑management strategies, performance‑testing methodology, and the optimizations that boosted concurrent interview capacity by twenty times while improving user experience.
Project Background
58.com’s lifestyle platform connects millions of users and merchants, and the pandemic sharply increased demand for online interview services. To improve recruiter efficiency and candidate experience, the AI Lab built an AI interview robot that simulates multi‑turn voice conversations using the Lingxi speech platform.
Backend Architecture
The system is divided into four layers:
Access Layer : Handles communication with audio‑video endpoints, negotiates UDP/IP/port pairs, and allocates resources via the SCF RPC framework.
Logic Layer : Generates robot prompts, synthesizes speech (TTS), performs VAD‑based sentence segmentation, streams ASR results, and drives the dialogue engine.
Data Layer : Stores script graphs, dialogue records, and annotation data.
Web System : Provides visual configuration of scripts, strategies, and labeling tools.
Interaction Flow
Pre‑Interview
When an audio‑video client sends an interview request through SCF, the service retrieves a free port pair from a queue, returns the IP/port to the client, and establishes UDP communication.
During Interview
The robot sends a TTS‑generated opening line, encodes and streams it, receives user speech, applies VAD and streaming ASR to obtain text, and the dialogue engine selects a response based on the script graph.
Post‑Interview
Upon interview termination, the robot releases allocated ports and threads, builds a candidate profile (e.g., availability, experience, age), and stores the recorded dialogue for recruiter review.
Dialogue Engine Core Functions
The engine drives a directed acyclic graph of interview scripts. An initial two‑branch script yielded a 20% completion rate. By redesigning the graph to support multi‑branch nodes (≥3 edges) and adding node‑level strategy chains, completion rose above 50%.
Data and Code Structure
New data entities include script tables, nodes, edges, and strategy chains. Nodes bind to scripts and hold text; edges define topology and contain regex or corpus rules. The code maps edges and nodes to adjacency lists, enabling fast lookup of outgoing edges for the current node.
Service Performance Optimization
Resource Management
Each interview session is represented by a session object (a thread). Resources such as ports, codecs, and threads are registered to the session and automatically released when the session ends. A broadcast message via the internal WMB queue ensures all service instances can clean up resources even if the original instance receives the termination request.
Resource Estimation
Key constraints include temporary session resources, network bandwidth (1000 MB/s ≫ 2500 × 32 KB/s), disk LRU eviction for custom questions, and thread‑pool metrics.
Performance Experiments
Stress tests at 2500 requests/min exposed a heap‑memory bottleneck: large allRobotVoiceBuffer and allUserVoiceBuffer objects (each 18.75 MB) caused 100% heap usage. Reducing the buffers to 0.47 MB and 0 MB respectively, while allowing dynamic expansion, eliminated OOM and stabilized GC.
Fine‑Grained Monitoring
Metrics were defined across service health, resource availability, workflow success, thread‑pool usage, position‑question handling, ASR latency, and VAD call counts, enabling proactive alerts and capacity planning.
Results
After applying the above optimizations, the AI interview robot’s concurrent handling capacity increased by 20×, supporting over a thousand simultaneous interview sessions with stable latency and reduced failure rates.
Conclusion
The article presented the AI interview robot’s backend architecture, end‑to‑end interaction flow, dialogue‑engine redesign, and a systematic performance‑optimization practice that dramatically improved scalability and user experience. Future work will continue iterating on features and further tuning performance for broader business adoption.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
