Backend Development 16 min read

Overview of the SCF RPC Framework: Architecture, Call Modes, Serialization, Service Registration, and Monitoring

This article introduces the SCF RPC framework developed by 58, covering its overall architecture, synchronous and callback call modes, timeout handling, custom serialization techniques, service registration and discovery using etcd, as well as data collection, storage, and monitoring mechanisms for large‑scale distributed services.

58 Tech
58 Tech
58 Tech
Overview of the SCF RPC Framework: Architecture, Call Modes, Serialization, Service Registration, and Monitoring

Preface

RPC (Remote Procedure Call) is a protocol that allows a program on one computer to request services from a program on a remote computer without needing to understand the underlying network details. For example, an application on node A calls an interface provided by an application on node B; node A sends the request data over the network, node B executes the corresponding interface and returns the result.

The RPC framework encapsulates common capabilities such as network transmission, serialization, load balancing, and fault removal, enabling node A to invoke remote interfaces as easily as local method calls.

SCF is a self‑developed RPC framework by 58, aiming to provide high‑performance, high‑reliability, and transparent remote call solutions in distributed environments.

The Service Management Platform, built on the SCF framework, offers automatic service node registration and discovery, load balancing, service authentication, comprehensive monitoring, and robust alerting.

Overall Architecture

SCF Service Provider: Uses SCF server capabilities to expose interfaces that can be remotely invoked.

SCF Consumer: Uses SCF client capabilities to call interfaces provided by service providers.

Control Center: Maintains the relationship between providers and consumers, generates configuration for consumers, and pushes updated configurations in real time when relationships change.

Monitoring Center: Collects traffic data from both providers and consumers, provides real‑time alerts, and helps improve service stability.

Visualization Management Platform: Offers a UI for viewing traffic metrics, configuring service relationships, and setting alerts.

The SCF provider and consumer constitute the core of the SCF framework, enabling basic RPC calls. The control center, monitoring center, and visualization platform supplement the core capabilities with governance features.

SCF Framework

SCF Call Modes

The most basic ability of an RPC framework is remote invocation; SCF provides two call modes: synchronous calls and callback calls.

Synchronous Call

Synchronous calls are the most commonly used mode and the default. The calling thread blocks until the remote method returns a result or a timeout occurs, at which point the thread is awakened to obtain the result or handle the timeout exception.

Callback Call

In a callback call, the interface returns immediately, and the calling thread does not wait for the server’s response, eliminating blocking. If the server later returns a result or a timeout, a dedicated callback thread processes it, so the caller must provide a callback implementation class.

Timeout Handling

In production, server health and network conditions can be unpredictable, leading to failures. Therefore, calls must specify a timeout; if no response is received within the configured period, a timeout exception is raised.

SCF implements timeout handling using a classic TimeWheel algorithm.

The algorithm uses an array to simulate a circular clock structure; each slot represents a time interval and holds a linked list of tasks. When adding a task, its expiration slot is calculated based on the current time, and the number of full rotations (circles) required before triggering is recorded.

Key points:

Expiration time has an error margin equal to the slot duration.

The thread scanning for expired tasks should be separate from the thread executing expiration actions to avoid interference.

Serialization

Network transmission requires binary data, while application data are objects. Serialization converts object state into a transmittable form; deserialization restores it.

SCF uses a custom serialization mechanism, supporting asymmetric serialization and generic serialization.

Asymmetric Serialization

When interfaces evolve, fields may be added or removed, leading to mismatched class definitions between provider and consumer. SCF assigns an ID to each field and writes data as id + length + value . During deserialization, the ID is read first; if the receiving class contains the field, the value is assigned, otherwise the data segment is skipped, allowing forward and backward compatibility.

To reduce overhead, primitive types embed the type in a tag: tag = (id << 3) | type , where only 3 bits are needed for the type, eliminating the need for a separate length field.

Generic Serialization

For fields of generic type (e.g., Object in Java), SCF generates a unique typeId by hashing the fully‑qualified class name. The typeId is written before the value; during deserialization, the typeId is read to locate the concrete class and then the value is read accordingly.

Service Registration and Discovery

Consumers need to know the IP list of service nodes to invoke them. Hard‑coding this list in configuration files is inflexible and cannot adapt to dynamic scaling.

Service registration and discovery automatically publish node information and allow consumers to detect changes in real time, enabling seamless traffic shifting.

SCF uses an etcd cluster to store each service node as a key with a TTL. Heartbeats refresh the TTL to keep the node online. A proxy layer isolates etcd from business deployments and forwards heartbeats while maintaining state for both providers and consumers.

If a node goes offline, etcd notifies the proxy, which pushes the updated node list to consumers. Consumers also periodically pull the list based on timestamps to ensure eventual consistency.

Monitoring Data Collection and Storage

Key operational questions include: Is the service healthy? What is the current traffic? Are there errors or timeouts?

Data Collection

Each service may have many methods deployed on multiple nodes, and many consumers may call them, resulting in a combinatorial explosion of data dimensions. The collection strategy distributes aggregation across layers to reduce pressure on central storage.

1. Collection plugins run on service nodes, performing local aggregation and reporting data per minute.

2. Plugins hash the service name to route data from the same service (across nodes) to the same collector, which performs a second aggregation step, further reducing load on the central counter.

Data Storage

For each call, several fields are recorded. Only timestamp, count, and latency are traffic‑related; the other fields (service name, node IP, method name, caller, type) are static identifiers. To save space, these identifiers are concatenated into a unique dimension string (e.g., S[demo]SN[10.0.0.1]SF[Service.get()]C[callerdemo] ) and mapped to a unique cid . Stored monitoring records replace the five identifiers with the cid .

Initially, only raw call metadata was stored, and queries aggregated data on demand, leading to slow dashboards. The system switched to a write‑expansion model: for each call, pre‑computed aggregates are written to the database, allowing direct lookup by cid and dramatically improving query performance.

Summary

SCF is a core component of 58’s distributed architecture, supporting tens of thousands of nodes across the group. This article covered basic call mechanisms and monitoring. Other aspects such as load balancing, network management, fault removal, authentication, and rate limiting are not detailed here. SCF continues to evolve, and contributions from interested developers are welcome.

distributed systemsmonitoringRPCSerializationservice governanceSCF
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.