Scaling a Nationwide ID Lookup Service with Minimal Resources

The article outlines a practical backend design for handling 20 million daily ID lookups across a billion records, showing how modest hardware—20 virtual machines with 16 GB RAM each—can meet the load using simple sharding, in‑memory storage, and basic networking techniques.

Programmer DD
Programmer DD
Programmer DD
Scaling a Nationwide ID Lookup Service with Minimal Resources

Assuming 20 million daily requests are evenly distributed over one hour (3,600 seconds), the peak concurrency is under 10,000 requests per second.

If the dataset contains one billion records, each record can be stored in 16 bytes (48‑bit integer for the ID, plus additional fields for test time, health code, and vaccine info), meaning the entire dataset fits into 16 GB of RAM; a typical PC already has 32 GB, and servers with 256 GB or even 1 TB have been commonplace for years.

Data can be sharded across multiple servers based on the first three or six digits of the ID (regional code), so each server only needs to store 100–200 million records. Twenty virtual machines with 16 GB RAM each provide ample capacity.

The system startup process is:

Load all records belonging to the server (200–400 million rows) from the database – a relatively slow step.

Begin serving requests.

Even if all 20 million requests occur within one hour, the system only needs to serve about 5,556 users per second.

At 2 KB of response data per user (1 KB is already sufficient unless QR codes are generated on the server), the required bandwidth is under 12 MB per second.

The service is essentially a short‑link lookup: the client performs a TCP handshake, sends the ID, receives the data, and closes the connection.

Because the workload is lightweight, there is no need to worry about the classic C10k problem; a 100 Mbps link per server is more than enough, and standard OS mechanisms (IOCP on Windows, epoll on Linux, or libevent for cross‑platform) suffice.

Searching the in‑memory array by ID can achieve O(1) lookup if a direct index is built; otherwise, a binary search (O(log N)) requires at most 30 comparisons for a billion entries, which is negligible compared to network latency.

In summary, twenty 16 GB virtual machines, simple array or binary‑search lookups, and a total outbound bandwidth of 100 Mbps per instance can comfortably handle 20 million queries per hour for a billion users, leaving ample performance headroom.

With a total bandwidth of 1 Gbps, the system could serve 200 million users per hour.

In practice, redundancy is needed: additional VMs can act as backups managed by Zookeeper, so if one node fails, another takes over immediately.

Data updates are not time‑critical; test results can be reflected in the query interface within an hour. A database trigger can notify downstream nodes of changes, prompting them to refresh their caches.

The overall logic is straightforward and does not require sophisticated techniques; even a programmer proficient in quicksort would be overkill for this task.

From experience, such a “large, complex, unprecedented” system can be implemented with just a few thousand lines of C and JavaScript code.

Therefore, involving developers in the design from the start is essential; otherwise, the final product may be overpriced and under‑delivered.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendScalabilitydatabaseSystem Design
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.