How to Build a Resilient, High‑Traffic Web Infrastructure: A Step‑by‑Step Ops Guide
This guide outlines a complete, practical workflow for acquiring multiple domains, configuring DNS, deploying CDN and image caches, selecting data‑center locations, setting up redundant servers, implementing monitoring, handling DDoS attacks, planning capacity, securing systems, and organizing an operations team to ensure high availability for large‑scale web services.
Domain Acquisition and DNS Management
Purchase a large number of domains (50‑100) from GoDaddy, separating primary and promotional domains. Enable domain privacy protection to hide real server IPs. Delegate DNS resolution to Cloudflare, DNSPod, or a self‑hosted DNS server (e.g., ZNDNS) that can return multiple IPs based on proximity, allowing faster DNS updates.
CDN Deployment
Buy CDN services (preferably Cloudflare). Point the domain to the CDN, which then forwards traffic to a shield server (“肉盾击”) and finally to the core server. The CDN provides global caching and can absorb attacks up to at least 200 GB.
Image Server Setup
Deploy a few domestic servers as image‑cache nodes; Nginx can serve as an image cache server.
Data Center and Server Selection
Select data‑center locations close to the user base. For high‑bandwidth needs, consider US data centers. Test ping latency nationwide using tools like chinaz. Choose providers with strong DDoS protection, reliable service, and responsive support. Use multiple locations (e.g., Hong Kong for core services, US for shield servers) to avoid a single point of failure.
Website Frontend and Backend Separation
Host the public website and the internal admin interface on separate machines to prevent interference. Other services can share a virtual machine to reduce cost. Use Gmail for corporate email and optionally build an internal chat system.
Monitoring and Logging
Deploy a monitoring system that checks server health, logs spikes, and alerts on anomalies. Forward logs to a central syslog server and visualize them with Cacti. Analyze traffic sources and set up alarm triggers for abnormal patterns.
Attack Mitigation Strategies
Small attacks can be blocked with Nginx and iptables. Large‑scale attacks require upstream DDoS protection (minimum 200 GB). If attacks originate from a few IPs, request the data‑center to block them. During an attack, quickly repoint the domain to a backup server or even to Baidu.
Redundancy and Capacity Planning
Design for at least double the expected concurrent users (e.g., 2 000 concurrent users for a 1 000‑user peak). Ensure the architecture can scale horizontally.
Network Interface and Hardware Configuration
Equip each server with three NICs: external user traffic, internal server‑to‑server traffic, and SSH management. Assign multiple IPs per NIC. Use RAID‑1 for disks, dual CPUs, dual power supplies, and avoid single points of failure. The shield server can be a low‑spec machine if the network is robust.
Database Replication and Service Separation
Implement master‑slave replication with off‑site backups. Configure Nginx upstream clustering. Separate front‑end and back‑end services onto different machines.
Environment Segmentation
Maintain three environments: developer machines, a LAN testing environment (with dedicated rack hardware), and an internet‑facing testing environment. Use SVN or Git for version control and only promote to production after thorough testing.
Security and Password Policies
Rotate all passwords (especially domain and email accounts) every three months.
LAN Stability and Bandwidth
Ensure a stable LAN with at least two 10 Mbps lines, a backup Wi‑Fi for mobile devices, and redundant cabling.
Team Organization and Tools
At least two ops engineers (or one ops manager plus one engineer) should share documentation and be on 24‑hour standby. Standardize tools: SQLyog for DB access, CRT for SSH, KeePass for password management, WinSCP for file transfers. Continuous learning and English proficiency are essential.
Disaster Recovery and Pre‑plan
Develop and rehearse a disaster‑recovery plan: switch to standby servers, verify backup restores, and conduct regular drills to ensure backups are usable.
High‑Concurrency Testing
Simulate 2 000 concurrent users to evaluate load. Invest in appropriate bandwidth, IP addresses, and data‑center locations. Spend where necessary, save where possible.
Operational Best Practices
Share all operational knowledge among team members, maintain detailed logs of every action with timestamps, perform risk assessments before production changes, and follow a structured workflow covering monitoring, capacity planning, process standardization, knowledge management, and automation.
Useful Commands
netstat -ant | grep $ip:80 | wc -l netstat -ant | grep $ip:80 | grep EST | wc -lSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
