Publishing, Registering, Discovering, Monitoring, Tracing and Governing RPC Services in Microservice Architecture
This article explains how to describe, publish, register, discover, invoke, monitor, trace, and govern RPC services in a microservice architecture, covering RESTful API, XML configuration, IDL files, registry principles, Zookeeper deployment, connection methods, server processing models, monitoring metrics, tracing concepts, and common governance techniques such as load balancing and fault tolerance.
1. How to Publish and Reference Services
Service description is the first step for service invocation and can be done via three common methods: RESTful API, XML configuration, and IDL files.
RESTful API
Advantage: Uses the public HTTP/HTTPS protocol, which has almost no learning cost for consumers.
Disadvantage: Relatively low performance.
XML Configuration
Typically used by private RPC frameworks because XML‑based protocols are faster than HTTP. Publishing and referencing steps:
Provider defines and implements the interface.
Provider loads server.xml at startup to expose the interface.
Consumer loads client.xml at startup to import the interface.
Advantages: High performance for private RPC. Disadvantages: High code intrusion and the need to update both sides when the XML changes.
IDL Files
Interface Description Language (IDL) provides a neutral way to describe interfaces across platforms and languages. Common IDLs include Facebook’s Thrift and Google’s gRPC.
Advantage: Enables cross‑language service calls.
Disadvantage: Large IDL files become hard to maintain, and any change forces all consumers to update.
Summary : Choose XML for simple internal Java services, IDL for multi‑language environments, and RESTful API for external exposure.
2. How to Register and Discover Services
Registry Principle
In microservice architecture there are three roles: Service Provider (RPC Server), Service Consumer (RPC Client), and Registry. Providers publish service info from server.xml to the registry; consumers subscribe to the registry using client.xml . The registry synchronizes node changes and provides load‑balanced node lists to clients.
Registry Implementation
Service registration API (register, deregister, heartbeat, subscribe, query, modify).
Cluster deployment for high availability, often using Zookeeper.
Zookeeper Working Principle
Each server keeps a copy of data in memory; clients can read from any server.
Leader election via Paxos, leader handles data updates via ZAB.
Ensures high availability and consistency.
Directory Storage
Zookeeper stores service information in a hierarchical znode structure, each znode has a unique path and can contain data and child znodes, supporting versioned data.
Health Check
Registry monitors provider health via long‑lived TCP sessions and heartbeat messages; unresponsive sessions cause the node to be removed.
Change Notification
When a node is added or removed, the registry notifies all subscribed consumers via Zookeeper Watcher.
Whitelist Mechanism
Only nodes listed in a whitelist are allowed to register, preventing accidental test nodes from entering production.
Summary : The registry is the glue that decouples providers and consumers, offering high‑availability node management, health detection, and change notification.
3. How to Implement RPC Remote Calls
Client‑Server Network Connection
HTTP Communication
Based on the application‑layer HTTP protocol over TCP. A request triggers a TCP three‑way handshake, and the connection is closed with a four‑way handshake.
Socket Communication
Uses TCP/IP sockets. Steps:
Server binds a port with bind() and starts listening via listen() .
Client connects using connect() .
Server accepts the connection with accept() .
Data exchange occurs via send() and receive() .
Network anomalies are handled by link‑alive detection (heartbeat) and reconnection retries with back‑off intervals.
Server Request Handling Models
Synchronous Blocking (BIO)
Each request creates a new thread; suitable for low‑concurrency scenarios.
Synchronous Non‑Blocking (NIO)
Uses I/O multiplexing (select) to handle many connections with a single thread; lower overhead but more complex.
Asynchronous Non‑Blocking (AIO)
Client initiates I/O and receives a completion notification; best for high‑concurrency, heavy‑I/O workloads but hardest to program.
Recommendation: Use mature frameworks such as Netty or Apache MINA.
4. How to Monitor Microservice Calls
Monitoring Objects
User‑side monitoring.
Interface (RPC) monitoring.
Resource monitoring (e.g., Redis).
Infrastructure monitoring (CPU, MEM, I/O, bandwidth).
Metrics
Request volume (QPS, PV).
Response time (average, percentile, slow‑request buckets).
Error rate.
Dimensions
Global, data‑center, machine, time, and core‑business dimensions.
Monitoring System Workflow
Data collection (agent or proxy).
Data transmission (UDP or Kafka).
Data processing (real‑time via Storm/Spark Streaming, offline via MapReduce/Spark).
Data storage (Elasticsearch for indexing, OpenTSDB for time‑series).
Data visualization (line charts, pie charts, heatmaps).
Sampling rate must balance real‑time accuracy with system overhead.
5. How to Trace Microservice Calls
Purpose of Tracing
Identify bottlenecks (network latency, gateway failures, service crashes, DB/cache issues).
Optimize call paths and reduce cross‑data‑center latency.
Generate topology graphs for dependency analysis.
Propagate business context (e.g., A/B testing flags) across services.
Tracing Fundamentals
traceId : Unique identifier for a user request.
spanId : Identifier for a specific RPC call within the trace.
annotation : Custom business data attached to a span.
Originated from Google’s Dapper paper; modern implementations include Zipkin, Pinpoint, Alibaba EagleEye, etc.
Tracing Architecture
Data collection layer – instrument code and report spans.
Data processing layer – aggregate and store spans (e.g., HBase, Hive).
Data presentation layer – visual call‑chain graphs and topology maps.
6. Service Governance Techniques
Node Management
Provider failures (crash, process exit) and network failures.
Registry‑driven heartbeat removal and client‑side removal mechanisms.
Load‑balancing algorithms: random, round‑robin, least‑active, consistent hash.
Service Routing
Static vs. dynamic routing rules (gray release, IDC‑aware routing).
Fault Tolerance
FailOver – retry on failure.
FailBack – delayed retry based on failure details.
FailCache – cache failures and retry later.
FailFast – immediate failure for non‑critical calls.
Idempotent calls can use FailOver or FailCache; non‑idempotent calls should prefer FailBack or FailFast.
Overall Summary : The article provides a comprehensive guide to service description, registration, discovery, RPC communication, monitoring, tracing, and governance in microservice systems, illustrating best‑practice patterns and common open‑source tools.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.