Backend Development 15 min read

How to Assemble a Production‑Ready Service Registry from Scratch

This article walks through the complete design of a service registry—from requirement analysis and interface definition to push mechanisms, health‑check strategies, long‑connection technology choices, data storage options, and high‑availability considerations—providing a practical blueprint for building a production‑grade registry.

Xiao Lou's Tech Notes

Jul 26, 2022

How to Assemble a Production‑Ready Service Registry from Scratch

Requirement Analysis

The registry must satisfy three core requirements: it must be registerable , discoverable , and highly available . High availability is a production‑level prerequisite, not an optional feature.

Three roles are defined:

Provider: the service that offers functionality.

Consumer: the service that calls the provider.

Registry: the central component that stores provider lists and consumer relationships.

Interface Definition

The registry exposes three client (SDK) interfaces:

unregister : remove a provider from the registry.

subscribe : let a consumer subscribe to a service; changes are pushed to the consumer.

Before choosing concrete fields, protocols, or serialization formats, the author checks for existing standards. The OpenSergo standard covers service governance and discovery, but its discovery part is still under development, so a pragmatic JSON‑based definition is used.

{
  "application":"provider_test", // application name
  "protocol":"http",               // protocol
  "addr":"127.0.0.1:8080",        // provider address
  "meta":{
    "cluster":"small",
    "idc":"shanghai",
    "tag":"read"
  }
}

{
    "subscribes":[
        {
            "provider":"test_provider1", // subscribed application
            "protocol":"http",
            "meta":{
                "cluster":"small",
                "idc":"shanghai",
                "tag":"read"
            }
        },
        {
            "provider":"test_provider2",
            "protocol":"http",
            "meta":{
                "cluster":"small",
                "tag":"read"
            }
        }
    ]
}

{
    "version":"23des4f",
    "endpoints":[
        {
            "application":"provider_test",
            "protocol":"http",
            "addr":"127.0.0.1:8080",
            "meta":{
                "cluster":"small",
                "idc":"shanghai",
                "tag":"read"
            }
        },
        {
            "application":"provider_test",
            "protocol":"http",
            "addr":"127.0.0.2:8080",
            "meta":{
                "cluster":"small",
                "idc":"shanghai",
                "tag":"read"
            }
        }
    ]
}

Change Push & Service Health Check

Two key factors guide serialization choice: language compatibility (JSON works everywhere) and performance (CPU cost and payload size). Push mechanisms considered:

Periodic polling – simple but high resource consumption.

Long polling – moderate difficulty, high real‑time performance, medium resource use.

UDP push – high real‑time, low resources, but unreliable; needs fallback polling.

TCP long‑connection push – moderate difficulty, high real‑time, medium resources.

Health‑check approaches compared:

Consumer passive probing – no registry dependency, but adds latency.

Consumer active probing – same drawbacks.

Provider heartbeat – non‑intrusive to callers, but consumes server resources.

Registry proactive probing – no client requirements, but heavy on resources and less real‑time.

Provider‑registry session keep‑alive – good real‑time, low resources, requires a persistent TCP connection.

The author prefers the last two: registry‑initiated probing and provider‑registry session keep‑alive, especially when a long‑connection is already in place.

Long‑Connection Technology Selection

Options evaluated include gRPC, Rsocket, Netty, and Mina. gRPC scores highest on multi‑language support and community activity, so it is the recommended choice.

Data Storage

Two categories of storage:

External components (MySQL, Redis, etc.) – mature scaling, but adds architectural complexity.

Internal registry storage – avoids extra components but requires robust replication. Consistency algorithms like Raft guarantee strong consistency at the cost of performance, while AP models (e.g., Nacos, Eureka) favor availability and eventual consistency, allowing providers to re‑register after failures.

High Availability

HA design is distributed across many details: replicated storage, failure‑oriented architecture, and caching of service lists in memory so that existing calls survive a full registry outage, though new services would need the registry to be up.

Summary

Starting from requirement analysis, the article decomposes a registry into modular pieces, evaluates interface design, push and health‑check strategies, long‑connection technology, storage options, and HA techniques, and finally sketches a minimal, production‑ready registry that can be assembled with code.