How to Build a Scalable, Smart Recommendation Slot for Short‑Video Apps

This article explains the background, design principles, high‑concurrency handling, storage optimization, rule‑engine implementation, and intelligent scheduling needed to create a universal, stable, extensible, and intelligent recommendation slot that enriches short‑video app ecosystems.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
How to Build a Scalable, Smart Recommendation Slot for Short‑Video Apps

Background

Short‑video platforms use recommendation slots that appear while a video is playing to surface related videos, articles, games, or activity links. Unlike search, recommendation must infer latent user interests, requiring a fine‑grained, user‑plus‑video level content delivery.

Key Capabilities of the Recommendation Slot

Universality : The slot serves multiple apps and product lines (e.g., video list pages, landing pages) from a single implementation.

Stability : It handles massive traffic on critical pages; any failure must not affect video playback.

Extensibility : The slot can host videos, articles, activities, game downloads, and product links, allowing low‑cost integration of new content types.

Intelligence : The system selects the most appropriate items and degrades gracefully when downstream services are impaired.

Design and Practice

2.1 Abstraction and Standardization

The slot is split into two abstractions:

Distribution capability – defines who sees the slot.

Card content definition – defines what is shown.

Product teams only configure card style, placement, and target audience; the platform handles delivery.

Abstraction diagram
Abstraction diagram

2.2 High‑Concurrency Availability

The service must sustain at least 100 k QPS and keep response latency under 50 ms to avoid impacting upstream product performance.

Degradation Strategy

Definition : When traffic spikes or server pressure rises, the service may become unavailable. Degradation prioritizes core business functions, allowing non‑core features to fail fast or return cached data.

Typical mechanisms :

Rate limiting (reject or cache)

Circuit breaking (fast‑fail or cache)

Feature‑level degradation switches

Storage Optimization

Because recommendation data is tightly coupled with video assets and does not require strong cross‑region consistency, a high‑write‑performance key‑value store is used as the primary store, complemented by a local second‑level cache. This architecture achieves >99.9 % availability.

Storage architecture
Storage architecture

Multi‑Path Data Fetching

Concurrent requests are issued to multiple downstream services with configurable timeout thresholds. Responses that arrive before the timeout are used; timed‑out responses are ignored. The number of parallel requests and timeout values must be balanced to maximize data coverage without overloading the system.

2.3 Platform Toolization for Rapid Business Integration

Frequent addition of new content types creates a high operational burden for rule updates, A/B experiments, and regression testing. The platform addresses this by abstracting business rules into plug‑in‑compatible, hot‑updatable components.

Rule Engine Workflow

Create a rule‑engine instance.

Load or replace a rule set.

Apply pre‑filter rules to incoming data objects.

Fetch material data based on pre‑filter results.

Run post‑aggregation rules on the material.

Emit the final execution result.

Rule engine workflow
Rule engine workflow

Core Rule Design

Each rule consists of two parts:

Emission condition – determines whether a resource should be emitted (e.g., based on app version, page, user segment).

Transformation definition – fills a template with parameters such as user ID, app version, or other context.

Rules can be combined with logical AND and organized in hierarchical parent‑child relationships. If any rule in the chain fails, the resource is not emitted.

Recording and Replay

User behavior is captured via standardized logs, cleaned, aggregated, and replayed to reproduce cases, verify business logic, and reduce regression‑testing cost.

Recording and replay architecture
Recording and replay architecture

2.4 Intelligent Scheduling for User Experience

Front‑end and back‑end telemetry collect impression, click, and watch‑time metrics for each recommendation card. These metrics feed automated quality evaluation and dynamic allocation algorithms that adjust distribution weights based on user profiles and service health.

Intelligent scheduling diagram
Intelligent scheduling diagram

Conclusion

By establishing a universal, high‑availability slot layer, the platform connects content and services to users. Storage and performance optimizations guarantee stability, while a rule‑engine‑driven tooling framework accelerates business onboarding and reduces regression cost. Intelligent scheduling ensures that only high‑quality, well‑performing resources are served, preserving ecosystem health as the system scales.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

rule engineBackend ArchitectureScalabilityrecommendation systemhigh concurrencyshort video
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.