Artificial Intelligence 17 min read

Backend Architecture and Performance Optimization of an AI Interview Robot

This article details the backend architecture, dialogue engine design, resource estimation, and performance optimization techniques of an AI interview robot used in 58.com’s recruitment platform, illustrating how multi‑branch dialogue flows, RTP communication, session management, and monitoring enable scalable, stable, and efficient online interview services.

58 Tech
58 Tech
58 Tech
Backend Architecture and Performance Optimization of an AI Interview Robot

Introduction

The AI interview robot leverages the Lingxi intelligent voice‑semantic platform to simulate multi‑round voice interactions between recruiters and candidates, achieving fully online interviews. Since its launch, it has handled millions of interview requests, significantly improving recruitment efficiency and candidate experience.

Project Background

58.com’s lifestyle service platform connects a massive number of C‑end users with B‑end merchants. The surge in online interview requests during the 2020 pandemic highlighted the need for an intelligent interview tool. The AI interview robot was built to address low link success rates caused by one‑to‑one video interview constraints, enabling recruiters to handle many candidates simultaneously and allowing candidates to interview anytime, anywhere.

Backend Architecture

The system consists of four layers:

Access Layer : Handles communication protocols with upstream/downstream components, extracts user profiles, and distributes interview timeline information.

Logic Layer : Manages dialogue interaction, converting robot prompts to speech via TTS, performing VAD‑based sentence segmentation, streaming speech‑to‑text recognition, and generating robot responses based on a directed‑acyclic dialogue graph.

Data Layer : Stores dialogue graphs, conversation records, and annotation data.

Web System : Provides visual configuration of dialogue structures, strategies, and annotation data.

Interaction Flow

The interview process is divided into three stages: pre‑interview, interview, and post‑interview. Pre‑interview establishes UDP communication, dynamically allocates IP and ports via the SCF RPC framework, and registers a session. During the interview, the robot sends TTS‑generated prompts, receives user speech, performs VAD and streaming ASR, and replies according to the dialogue graph. Post‑interview releases allocated resources, builds a candidate profile, and stores the conversation.

Dialogue Engine Core Functions

The engine drives conversation using a multi‑branch dialogue graph. Initially a two‑branch graph yielded a 20% completion rate. After redesigning to a multi‑branch structure with node‑level strategy chains, completion rose above 50%.

Data Layer Design

New data entities such as dialogue tables, nodes, and edges were introduced. Nodes bind to dialogue IDs and store text, while edges define topology, regex matching, and corpus rules. Strategy chains are linked to nodes for fine‑grained control.

Code Layer Design

Classes representing edges, nodes, dialogues, and dialogue states were abstracted. The dialogue graph is implemented as an adjacency list, enabling the engine to traverse from the current node to appropriate edges based on user input.

Performance Optimization Practices

To support thousands of concurrent online interviews, four areas were optimized:

Resource Management : Sessions encapsulate temporary resources (ports, threads, codecs). Resources are released when a session ends, and a monitoring thread reclaims stale sessions.

Resource Estimation : Limits on temporary resources, network bandwidth, disk cache, and thread pool metrics were defined and monitored.

Performance Testing : Stress tests at 2500 requests/min identified heap memory as a bottleneck. Memory usage was reduced by shrinking audio buffers, leading to stable garbage collection.

Fine‑Grained Monitoring : Key metrics (service KPIs, resource KPIs, flow KPIs, thread‑pool KPIs, ASR/VAD KPIs) were tracked to ensure stability.

Conclusion

The article presented the AI interview robot’s backend architecture, full interaction flow, dialogue engine core, and performance optimization methods. Ongoing work will continue to iterate on features and further improve scalability for broader business adoption.

Performance Optimizationbackend architectureAIResource ManagementRTPDialogue EngineInterview Robot
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.