Evolution of Qingteng FM Real-Time Recommendation System: Architecture, Models, and Microservice Deployment

This article details Qingteng FM's transition from offline batch recommendation to a real-time, deep‑learning driven recommendation system, covering system architecture, recall, coarse‑ranking, ranking models, diversity enhancements, episode recommendation, and the micro‑service framework that streamlined feature engineering and model deployment.

DataFunTalk
DataFunTalk
DataFunTalk
Evolution of Qingteng FM Real-Time Recommendation System: Architecture, Models, and Microservice Deployment

01 Background

Qingteng FM is a leading Chinese mobile audio platform, the first audio media platform in China, with over 450 million users and 130 million monthly active users.

02 Recommendation System Evolution

The recommendation system has evolved from offline batch recommendation to near‑real‑time and finally millisecond‑level real‑time recommendation, improving user listening experience and traffic distribution efficiency.

03 Real‑time Recommendation Architecture

A good real‑time recommendation system should satisfy four requirements: handle massive data with low latency, allow rapid iteration of strategies and algorithms, degrade gracefully under failures, and record user feedback accurately and comprehensively.

The system consists of four layers: offline, pipeline, online, and business layers. The business layer includes multiple recommendation scenarios, most of which share the same framework, and each layer must ensure stability and real‑time performance.

Recall Layer Review

The recall stage filters from tens of thousands of items to a few hundred using multi‑strategy fusion. Traditional CF and ALS strategies have been gradually replaced by deep recall models such as EGES, SRGNN, and DSSM, enriching recall sources and better capturing user interests.

Challenges remain in improving recall accuracy.

Coarse Ranking Layer Review

A dual‑tower DSSM model is used for coarse ranking. The user tower and item tower are decoupled, enabling efficient inner‑product computation and easy online deployment. Real‑time user behavior features are incorporated to capture user interests.

Online deployment on the homepage feeds increased average listening time by 5.44% and effective UV‑PTR by 3.26%.

Ranking Layer Review

Due to growing feature dimensions and latency constraints, the system moved from XGBoost to DeepFM. DeepFM combines a shallow FM component for feature crossing with a deep neural network for higher‑order interactions, handling large sparse ID features effectively.

After deployment, average listening duration improved by 10.94%, UV‑PTR by 9.26%, and Item‑PTR by 7.83% on the homepage feeds.

Episode Recommendation

Multiple recall strategies for episodes were developed: EGES graph‑embedding similarity to address cold‑start, BERT‑based semantic similarity, hot‑episode recall with a two‑week window, and a "follow‑up" strategy that recommends new episodes from albums the user is following.

Online experiments showed a 5% increase in average listening time and a 7% rise in effective UV‑PTR for combined album‑episode recommendations.

Ranking Layer with MMoE

A Multi‑Task Mixture‑of‑Experts (MMoE) model was introduced to jointly predict play and complete rates, offering better task‑specific learning than shared‑bottom models.

Future work includes adding more tasks such as collection, comment, and download, and exploring PLE and dynamic loss weighting.

Diversity Enhancement in Re‑ranking

Three methods were applied to improve diversity: dynamic refresh (adjusting scores based on exposure frequency), Maximal Marginal Relevance (MMR) to balance relevance and diversity, and Determinantal Point Process (DPP) using pre‑computed embeddings.

AB tests demonstrated over 10% improvement in diversity metrics without harming conversion rates.

Feature Engineering and Model Micro‑serviceization

A configuration‑driven feature service was built to ensure offline‑online consistency, support multi‑version and multi‑scenario deployment, and reduce model rollout time from weeks to days.

The unified service, implemented in Scala and deployed on a container cloud platform, has been adopted across recommendation and search scenarios.

Conclusion and Outlook

Qingteng FM's recommendation system continues to advance with AI techniques, real‑time processing, diversity strategies, and micro‑service architecture, aiming to further improve user satisfaction, retention, and overall listening experience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Real-TimeMicroservicesAudio Platform
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.