Operations 12 min read

How to Build a Resilient, High‑Traffic Web Infrastructure: A Step‑by‑Step Ops Guide

This guide outlines a complete, practical workflow for acquiring multiple domains, configuring DNS, deploying CDN and image caches, selecting data‑center locations, setting up redundant servers, implementing monitoring, handling DDoS attacks, planning capacity, securing systems, and organizing an operations team to ensure high availability for large‑scale web services.

dbaplus Community

Mar 18, 2024

How to Build a Resilient, High‑Traffic Web Infrastructure: A Step‑by‑Step Ops Guide

Domain Acquisition and DNS Management

Purchase a large number of domains (50‑100) from GoDaddy, separating primary and promotional domains. Enable domain privacy protection to hide real server IPs. Delegate DNS resolution to Cloudflare, DNSPod, or a self‑hosted DNS server (e.g., ZNDNS) that can return multiple IPs based on proximity, allowing faster DNS updates.

CDN Deployment

Buy CDN services (preferably Cloudflare). Point the domain to the CDN, which then forwards traffic to a shield server (“肉盾击”) and finally to the core server. The CDN provides global caching and can absorb attacks up to at least 200 GB.

Image Server Setup

Deploy a few domestic servers as image‑cache nodes; Nginx can serve as an image cache server.

Data Center and Server Selection

Select data‑center locations close to the user base. For high‑bandwidth needs, consider US data centers. Test ping latency nationwide using tools like chinaz. Choose providers with strong DDoS protection, reliable service, and responsive support. Use multiple locations (e.g., Hong Kong for core services, US for shield servers) to avoid a single point of failure.

Website Frontend and Backend Separation

Host the public website and the internal admin interface on separate machines to prevent interference. Other services can share a virtual machine to reduce cost. Use Gmail for corporate email and optionally build an internal chat system.

Monitoring and Logging

Deploy a monitoring system that checks server health, logs spikes, and alerts on anomalies. Forward logs to a central syslog server and visualize them with Cacti. Analyze traffic sources and set up alarm triggers for abnormal patterns.

Attack Mitigation Strategies

Small attacks can be blocked with Nginx and iptables. Large‑scale attacks require upstream DDoS protection (minimum 200 GB). If attacks originate from a few IPs, request the data‑center to block them. During an attack, quickly repoint the domain to a backup server or even to Baidu.

Redundancy and Capacity Planning

Design for at least double the expected concurrent users (e.g., 2 000 concurrent users for a 1 000‑user peak). Ensure the architecture can scale horizontally.

Network Interface and Hardware Configuration

Equip each server with three NICs: external user traffic, internal server‑to‑server traffic, and SSH management. Assign multiple IPs per NIC. Use RAID‑1 for disks, dual CPUs, dual power supplies, and avoid single points of failure. The shield server can be a low‑spec machine if the network is robust.

Database Replication and Service Separation

Implement master‑slave replication with off‑site backups. Configure Nginx upstream clustering. Separate front‑end and back‑end services onto different machines.

Environment Segmentation

Maintain three environments: developer machines, a LAN testing environment (with dedicated rack hardware), and an internet‑facing testing environment. Use SVN or Git for version control and only promote to production after thorough testing.

Security and Password Policies

Rotate all passwords (especially domain and email accounts) every three months.

LAN Stability and Bandwidth

Ensure a stable LAN with at least two 10 Mbps lines, a backup Wi‑Fi for mobile devices, and redundant cabling.

Team Organization and Tools

At least two ops engineers (or one ops manager plus one engineer) should share documentation and be on 24‑hour standby. Standardize tools: SQLyog for DB access, CRT for SSH, KeePass for password management, WinSCP for file transfers. Continuous learning and English proficiency are essential.

Disaster Recovery and Pre‑plan

Develop and rehearse a disaster‑recovery plan: switch to standby servers, verify backup restores, and conduct regular drills to ensure backups are usable.

High‑Concurrency Testing

Simulate 2 000 concurrent users to evaluate load. Invest in appropriate bandwidth, IP addresses, and data‑center locations. Spend where necessary, save where possible.

Operational Best Practices

Share all operational knowledge among team members, maintain detailed logs of every action with timestamps, perform risk assessments before production changes, and follow a structured workflow covering monitoring, capacity planning, process standardization, knowledge management, and automation.

Useful Commands

netstat -ant | grep $ip:80 | wc -l

netstat -ant | grep $ip:80 | grep EST | wc -l

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring High Availability CDN Web infrastructure Server Configuration domain management

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.