Operations 13 min read

How We Resolved a Sudden DNS Outage That Took Down Our Website and App

When a Saturday early-morning outage left the company’s website and mobile app inaccessible for many users, the team traced the issue to an unpaid domain causing DNS resolution failures, detailed the investigation steps, temporary fixes, and lessons learned about DNS processes and operational readiness.

ITPUB
ITPUB
ITPUB
How We Resolved a Sudden DNS Outage That Took Down Our Website and App

Incident Overview

In the early hours of a Saturday, users reported that the public website and mobile application were unreachable. Initial checks on the support representative’s network showed no problem, but the issue quickly spread, especially among users of China Unicom.

Investigation Timeline

At 09:00 the support team convened. Recent deployments (new module, bug fixes, HTTPS configuration) were reviewed and ruled out. Network and operations staff confirmed that web servers, database servers, and monitoring systems were all operational.

Ping tests from the office showed that the domain name could not be resolved, while direct IP access worked, indicating a DNS resolution failure. External checks from public networks reproduced the same symptom.

Root Cause Identification

The domain registrar (Wanwang) had placed the domain in an unpaid status overnight, causing the registrar to suspend DNS services for the domain. Because DNS caches at ISPs update at different times, some users could still resolve the domain while others could not.

The registrar confirmed that full DNS service would be restored within 48 hours after payment, but an immediate fix was required.

Temporary Mitigation

Changed local DNS resolvers to Google Public DNS (8.8.8.8), which restored access for many users.

Released a quick mobile‑app build that used the server’s IP address instead of the domain name.

Provided iOS users with instructions to change their device DNS settings.

Used external monitoring tools (17ce and 360 QiYun) to track DNS propagation across ISPs, confirming that the outage persisted mainly for China Unicom in Beijing.

Executed ipconfig /flushdns on Windows workstations and cleared local DNS caches to eliminate stale records.

Final Resolution

After the domain fee was paid, the registrar re‑enabled DNS. Propagation completed over the next day, and by the following Monday the website and app were reachable for all users.

DNS Resolution Process Explained

DNS translates human‑readable domain names into IP addresses through a hierarchical lookup:

The browser checks its own cache for a stored IP address.

If not cached, the operating system checks the local hosts file.

The local DNS resolver cache is consulted next.

If still unresolved, the query is sent to the configured local DNS server.

The local DNS server returns a cached answer if it has one.

Otherwise it queries the root servers, then the top‑level domain (TLD) servers, and finally the authoritative name servers until the IP address is obtained.

If the local DNS server is set to forward queries, it passes the request to an upstream DNS server, which repeats the process.

This multi‑step process can cause delays when a domain’s DNS records are suspended or not yet propagated.

Lessons Learned

Process gaps: Inadequate handover when staff left resulted in a missed domain renewal.

Crisis response: Lack of a mature incident‑handling procedure affected the company’s reputation.

Monitoring deficiency: Absence of proactive DNS health checks allowed the outage to go unnoticed until users reported it.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

incident managementDNSOutage
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.