Self-Developed HTTPDNS Service: Cost Estimation, Architecture, Optimization, and Lessons Learned
To cut the hundreds‑of‑thousands‑yuan monthly bill of a commercial HTTPDNS service, the team built a multi‑region, self‑hosted HTTPDNS platform, estimated to slash costs by up to 90%, then resolved unexpected TLS bandwidth waste by improving connection reuse, ultimately achieving over 80% savings and planning a hybrid‑cloud deployment.
The company currently uses a commercial HTTPDNS service that charges per request. Although the per‑request price appears low, the massive user base and billions of daily queries result in monthly costs of several hundred thousand yuan, enough to buy a house near the national center. Reducing these costs became a key cost‑saving initiative.
After evaluating the feasibility, the team decided to develop an in‑house HTTPDNS service. The first step was a cost‑benefit estimate. A single core can handle roughly 7,000 QPS; to support an anticipated 500,000 QPS would require fewer than 100 cores and about 200 GB of memory (1:2 memory‑to‑CPU ratio). Each DNS response is under 600 KB, leading to an estimated 3 GB of bandwidth at 500 k QPS. The internal estimate suggested an 80‑90% cost reduction compared with the commercial service.
The initial architecture leveraged cloud resources with BGP lines, which were benchmarked against the commercial provider’s ANYCAST IPs. Tests showed comparable availability and latency.
The server side adopts a multi‑region, multi‑vendor deployment with seven layers of load balancers to mitigate single‑machine failures and simplify protocol handling. To prevent DNS service disruption from domain hijacking, the server returns a list of IP addresses to the client.
Key server modules include:
Authentication: verifies request legitimacy before proceeding.
Domain probing: collects weighted IPs per region and operator, using c‑ares (modified to achieve >30 k QPS per core).
Cache: stores pre‑loaded domain results and updates them periodically.
State machine: decides whether to serve from cache or query authoritative DNS based on TTL.
The client side architecture is illustrated below.
Client modules consist of API parsing, configuration, cache management, internal business logic (including cost‑saving, IPv6, overseas strategies), source adaptation (local DNS hijacking, Ali certificate checks), Cronet adaptation, and language/protocol adapters. The client provides HTTPDNS/DNS resolution, supports both self‑built and third‑party services, and allows fallback to local DNS.
During the first month of operation, the expected cost savings were not fully realized. Bandwidth consumption was 90% of total cost, far higher than the estimate. Packet captures revealed that TLS handshakes dominated traffic, and HTTPS connections consumed about 80% more bandwidth than HTTP at the same QPS. Low connection‑reuse rates caused frequent TLS handshakes because the client performed simple round‑robin polling across IPs, establishing new connections for each unseen IP.
To address this, the team temporarily disabled HTTPS and used HTTP, reducing bandwidth. However, because the entire site must remain HTTPS, the long‑term solution focused on improving TLS connection reuse. Additionally, the team evaluated replacing expensive BGP lines with a single‑line solution, which could cut bandwidth costs by ~70% while maintaining acceptable latency.
Further optimization involved a domain‑hijacking strategy: the server uses existing domain certificates (eliminating the need for per‑IP certificates), and the client performs local DNS hijacking to map the domain to the optimal IP. The client maintains long‑lived connections to selected IPs, reusing them whenever possible. If a connection fails, the client selects another IP from the operator‑specific list; if the network changes, caches are refreshed.
After applying these measures, TLS connection reuse increased from ~20% to over 90%, and overall HTTPDNS costs dropped by more than 80%, meeting the project's goals.
The development and migration took about two months, and the team plans to adopt a hybrid‑cloud model, combining cloud and self‑built resources to further reduce expenses.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.