How NetEase Cloud IM SDK Prevents DNS Hijacking with HttpDNS High‑Availability
This article explains the DNS hijacking threat, shares a real incident affecting NetEase Cloud IM, and details a comprehensive high‑availability architecture—including HttpDNS, laddered HTTP requests, caching strategies, and SNI handling—that protects the SDK from DNS attacks and ensures reliable service.
Guide: DNS hijacking attacks DNS resolution servers (DNS) to redirect a domain to an incorrect IP, causing service disruption or malicious redirection. NetEase Cloud IM SDK, a B2B product, must mitigate such risks in complex network environments.
A past incident where the domain netease.im was hijacked prevented applications using the IM SDK from logging in, prompting an analysis of the SDK’s login flow.
When the "Update LBS" step encounters DNS hijacking, the SDK may timeout or receive wrong responses, failing to obtain the correct Link server address. To prevent this, NetEase Cloud adopts a high‑availability solution based on HttpDNS.
Preventing DNS Hijacking
Common mitigation methods include:
Using HttpDNS to obtain a list of correct, optimal IPs directly from a management system.
Implementing a laddered HTTP request mechanism that tries multiple IPs with short timers before falling back.
These approaches reduce the risk of hijacking across all scenarios.
LocalDNS Hijacking Types
Compromised broadband routers alter the user’s LocalDNS, returning forged IPs.
Intercepting DNS queries and injecting fake responses before the legitimate answer arrives.
Cache poisoning where LocalDNS caches altered results.
HttpDNS Implementation
Step 1: The client calls the HttpDNS API to retrieve a list of correct, fastest IPs for a domain.
Step 2: The client sends business protocol requests directly to the obtained IPs, setting the Host header (and handling SNI for HTTPS).
High‑Availability Strategy
The IM SDK integrates HttpDNS to achieve high availability. The overall architecture includes:
The SDK supports cross‑platform native development (Windows, macOS, iOS, Android) and implements the following core functions:
HttpDNS service interface updates and cache maintenance.
Domain query result caching with TTL (default 5 minutes) and redundancy time.
HTTP request handling, including a laddered multi‑address request mechanism.
Example of laddered HTTP request code:
int kRequestTimeout = 30*1000; // 30 seconds
list<string> lstURLs = {
"https://192.168.1.1/xxx",
"https://192.168.1.2/xxx",
// ...
"https://192.168.1.n/xxx",
"https://192.168.1.n+1/xxx"
};
int nMaxTimeout = lstURLs.size() * kRequestTimeout;The component also maintains a solid‑IP list to fall back if HttpDNS itself is hijacked, updating service addresses periodically (TTL ≈ 1 hour).
Domain Query Cache
Cache entries have a TTL of 5 minutes and a redundancy window (≈ 75 % of TTL). Depending on the cache state, the component may return cached results, trigger background updates, or refresh the cache after expiration.
HTTP Access Flow Design
Suspected Hijack Event Reporting
If an HTTP request fails for non‑network reasons and triggers an HttpDNS domain query, the component treats the domain as potentially hijacked and reports diagnostic data (request URL, resolved IP, platform, error code, timestamp, latency, business token) to NetEase’s data platform for further analysis.
request_url
Requested URL
host_ip
LocalDNS resolution result
platform
Platform identifier
error_code
Error code on failure
timestamp
Time of request
consumed
Request latency
business_token
Business identifier
SNI Handling
To allow multiple virtual hosts on a single IP, the client includes the target hostname in the TLS Client Hello via the Server Name Indication (SNI) extension. This enables the server to present the correct certificate.
Example libcurl configuration code:
bool configureCURLRequest(CURL *curl, const std::string& url, unsigned int timeOut = 7000, const std::string& ip = "", unsigned short port = 443) {
bool ret = false;
do {
if (curl != nullptr || url.empty())
break;
if (curl_easy_setopt(curl, CURLOPT_URL, url.c_str()) != CURLE_OK)
break;
if (curl_easy_setopt(curl, CURLOPT_TIMEOUT_MS, timeOut) != CURLE_OK)
break;
if (curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L) != CURLE_OK)
break;
if (NE_NET::NimNetUtil::IsHttpsScheme(url)) {
if (curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L) != CURLE_OK)
break;
if (curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 2L) != CURLE_OK)
break;
if (!ip.empty()) {
if (curl_easy_setopt(curl, CURLOPT_DNS_USE_GLOBAL_CACHE, false) != CURLE_OK)
break;
std::string domain = NE_NET::NimNetUtil::GetDomainFromURL(url);
std::string dns = domain + ":" + std::to_string(port) + ip;
struct curl_slist *dnsInfo = curl_slist_append(NULL, dns.c_str());
if (curl_easy_setopt(curl, CURLOPT_RESOLVE, dnsInfo) != CURLE_OK)
break;
}
}
ret = true;
} while (false);
return ret;
}References
DNS pollution – Baidu Baike; Domain hijacking – Baidu Baike; DNS cache poisoning – Wikipedia.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
NetEase Smart Enterprise Tech+
Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
