Backend Development 11 min read

How NetEase Cloud’s Distributed Recording Cluster Ensures High‑Availability and Scalability

This article explains the architecture and key features of NetEase Cloud's local server‑side recording cluster, detailing how dynamic scaling, multi‑backup high availability, load‑balancing strategies, monitoring, and an embedded registration center enable secure, reliable, and scalable recording for data‑sensitive applications.

NetEase Smart Enterprise Tech+

Dec 14, 2021

How NetEase Cloud’s Distributed Recording Cluster Ensures High‑Availability and Scalability

Background

Based on NetEase Cloud's Linux recording SDK, developers can implement local server‑side recording on their Linux servers, achieving higher security by storing recordings locally and offering customization such as watermarks. This is especially important for finance and other data‑sensitive scenarios.

Most client developers are Java engineers who need to bridge the Linux SDK via JNI, handle load balancing, and ensure reliability, which raises integration costs. Building a distributed recording cluster reduces these costs while providing scalable, highly available recording services.

System Features

Dynamic scaling of the recording cluster.

Multi‑backup recording for high availability.

Highly available registration center.

Various load‑balancing strategies (CPU load, concurrency, round‑robin, random).

CPU usage threshold warnings.

Real‑time health monitoring and alerts during recording.

Extensible object storage options.

Cross‑platform REST API for invoking local server recording.

Native, non‑intrusive Java API SDK JAR for quick Java integration.

Overall Architecture

The system is divided into three layers.

Interface Layer

Business services can directly use NetEase Cloud's Java SDK to start, update layout, and stop recordings.

Service Layer

Provides standard REST APIs protected by secret key authentication. A scheduling component assigns recording tasks to appropriate nodes based on node weight and current load, routes layout updates and stop requests, and supports multi‑node recording for high availability. Each node registers with a highly available registration center, sends heartbeat messages, and participates in leader election for cluster‑wide scheduled tasks.

The registration center remains operational as long as a majority of its nodes are online.

Leader election enables a single node to execute cluster‑wide timed jobs.

Nodes report load and concurrency metrics to the registration center for intelligent scheduling.

The recording executor invokes the native recording SDK via a dynamic library, managing a dedicated process per room. Communication with the executor occurs over sockets, allowing start/stop, layout updates, and health checks with immediate alerts on failures.

Comprehensive monitoring tracks overall service health, especially CPU usage, and triggers circuit‑breaker protection when CPU or concurrency exceeds configured thresholds, safeguarding cluster stability.

A configuration management module stores system and business configurations, propagating updates automatically across the cluster.

The object storage module handles recorded files, supporting uploads to FTP, NetEase NOS, Alibaba OSS, or other object storage services.

Two types of scheduled jobs run in the cluster: a leader‑only job that scans all active recordings for health issues and performs cleanup, and a distributed job that periodically removes expired recordings and logs to reclaim disk space.

Data Layer

The yellow area in the architecture diagram represents the core recording cluster. Application servers interact with the cluster transparently and receive recorded file copies after completion.

Sequence Flow

When a recording node starts, it registers with the registration center, which broadcasts the registration to other nodes. Application servers fetch the node list via the Java SDK, select nodes based on backup count and load‑balancing policies, and initiate recording. Layout updates are routed to the appropriate node using the recording ID. The cluster’s leader node periodically checks recording health and reports failures or CPU threshold breaches to the application servers.

Embedded Registration Center

To reduce external dependencies, an embedded registration center is deployed within each recording node, allowing nodes to act as both recording agents and registration servers. Nodes exchange heartbeats, elect a leader, and newly added nodes initially operate without registration capabilities until activated.

Conclusion

Local server‑side recording is suitable not only for finance but for any scenario demanding high data security. The article presented NetEase Cloud’s distributed recording cluster architecture, which systematically addresses concurrent recording and availability challenges, enabling customers to quickly implement secure and reliable recording solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Monitoring high availability Load Balancing REST API Java SDK cloud recording

Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.