Artificial Intelligence 17 min read

How an AI Interview Bot Scaled 20× Faster with Backend Architecture Optimizations

This article details the design of an AI interview robot for 58.com, covering its backend architecture, dialogue engine, resource‑management strategies, performance‑testing methodology, and the optimizations that boosted concurrent interview capacity by twenty times while improving user experience.

ITPUB

Jul 25, 2022

How an AI Interview Bot Scaled 20× Faster with Backend Architecture Optimizations

Project Background

58.com’s lifestyle platform connects millions of users and merchants, and the pandemic sharply increased demand for online interview services. To improve recruiter efficiency and candidate experience, the AI Lab built an AI interview robot that simulates multi‑turn voice conversations using the Lingxi speech platform.

Backend Architecture

The system is divided into four layers:

Access Layer : Handles communication with audio‑video endpoints, negotiates UDP/IP/port pairs, and allocates resources via the SCF RPC framework.

Logic Layer : Generates robot prompts, synthesizes speech (TTS), performs VAD‑based sentence segmentation, streams ASR results, and drives the dialogue engine.

Data Layer : Stores script graphs, dialogue records, and annotation data.

Web System : Provides visual configuration of scripts, strategies, and labeling tools.

Interaction Flow

Pre‑Interview

When an audio‑video client sends an interview request through SCF, the service retrieves a free port pair from a queue, returns the IP/port to the client, and establishes UDP communication.

During Interview

The robot sends a TTS‑generated opening line, encodes and streams it, receives user speech, applies VAD and streaming ASR to obtain text, and the dialogue engine selects a response based on the script graph.

Post‑Interview

Upon interview termination, the robot releases allocated ports and threads, builds a candidate profile (e.g., availability, experience, age), and stores the recorded dialogue for recruiter review.

Dialogue Engine Core Functions

The engine drives a directed acyclic graph of interview scripts. An initial two‑branch script yielded a 20% completion rate. By redesigning the graph to support multi‑branch nodes (≥3 edges) and adding node‑level strategy chains, completion rose above 50%.

Data and Code Structure

New data entities include script tables, nodes, edges, and strategy chains. Nodes bind to scripts and hold text; edges define topology and contain regex or corpus rules. The code maps edges and nodes to adjacency lists, enabling fast lookup of outgoing edges for the current node.

Service Performance Optimization

Resource Management

Each interview session is represented by a session object (a thread). Resources such as ports, codecs, and threads are registered to the session and automatically released when the session ends. A broadcast message via the internal WMB queue ensures all service instances can clean up resources even if the original instance receives the termination request.

Resource Estimation

Key constraints include temporary session resources, network bandwidth (1000 MB/s ≫ 2500 × 32 KB/s), disk LRU eviction for custom questions, and thread‑pool metrics.

Performance Experiments

Stress tests at 2500 requests/min exposed a heap‑memory bottleneck: large allRobotVoiceBuffer and allUserVoiceBuffer objects (each 18.75 MB) caused 100% heap usage. Reducing the buffers to 0.47 MB and 0 MB respectively, while allowing dynamic expansion, eliminated OOM and stabilized GC.

Fine‑Grained Monitoring

Metrics were defined across service health, resource availability, workflow success, thread‑pool usage, position‑question handling, ASR latency, and VAD call counts, enabling proactive alerts and capacity planning.

Results

After applying the above optimizations, the AI interview robot’s concurrent handling capacity increased by 20×, supporting over a thousand simultaneous interview sessions with stable latency and reduced failure rates.

Conclusion

The article presented the AI interview robot’s backend architecture, end‑to‑end interaction flow, dialogue‑engine redesign, and a systematic performance‑optimization practice that dramatically improved scalability and user experience. Future work will continue iterating on features and further tuning performance for broader business adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance optimization backend architecture RTP Dialogue Engine AI Interview Bot Streaming ASR

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.