Industry Insights 30 min read

How Autohome Scaled Its 818 Global Car Night to Millions of QPS: A Technical Deep Dive

The article details how Autohome tackled a severe market downturn by launching the 818 Global Car Night, describing the background, massive technical challenges, infrastructure scaling, high‑availability architecture, full‑link stress testing, monitoring, security measures, and the lessons learned for future large‑scale online events.

HomeTech
HomeTech
HomeTech
How Autohome Scaled Its 818 Global Car Night to Millions of QPS: A Technical Deep Dive

1. Background

China's car market, which had grown for nearly 30 years, entered a downturn in 2018 with the first negative growth, and sales fell 13.5% and 11.4% in the first seven months of the current year. The shift from National V to VI emission standards caused a surge in vehicle registrations, prompting concerns that the second half of 2019 would be even tougher.

Autohome, the leading automotive internet platform, decided to use the 818 Global Car Night as a "double‑11"‑style stimulus for the industry, leveraging its massive user traffic, data assets, and partnerships with manufacturers and TV stations to create a festival‑like event that could reverse the market slump.

2. Technical Challenges and Guarantees

The 818 event required the largest and most demanding technical guarantee in Autohome's history, with four major challenges:

High guarantee requirements: no downtime, no bottlenecks, estimated QPS of 4 million.

Broad guarantee scope: 8 business units, over 50 systems, demanding cross‑department collaboration.

Very short preparation time: less than two months.

Little prior experience with events of this scale.

To meet these challenges, the team adopted a philosophy of simplicity, elasticity, intelligent scheduling, multi‑layer self‑healing, and cost control, executing the following steps:

Step 1: New System Development & Legacy System Optimization

Developed new business systems for red‑packet, VR, and interactive features.

Built a full‑link pressure‑testing tool.

Created a unified operations‑monitoring platform.

Optimized more than 40 existing business systems.

Step 2: Elastic Infrastructure Expansion

Added nearly a thousand servers.

Provisioned over 2,300 public‑cloud VMs.

Expanded CDN capacity by 1 Tbps.

Increased Yizhuang data‑center bandwidth to 40 Gbps.

Step 3: Full‑Link Stress Testing & Security Assurance

Conducted six rounds of twelve full‑link stress tests.

Involved more than 170 applications and 1,000+ interfaces.

Identified and resolved over 150 system issues.

Step 4: On‑site Guarantee

Established an on‑site technical command center.

Prepared battle plans, flow‑control schemes, security plans, and a one‑stop monitoring dashboard.

2.1 Infrastructure Elastic Expansion

Autohome traditionally used self‑built data centers. For the 818 event, the team shifted the most traffic‑intensive services (registration, login, red‑packet, AR) to public cloud, launching more than 2,300 cloud VMs and integrating public‑cloud services such as Redis, object storage, load balancers, and CDN.

CDN was critical: three providers were provisioned to each handle a peak of 1 Tbps, with DNS‑based rapid failover and byte‑hit rates improved from 80% to 99.9%.

Dedicated 100 Gbps private lines were added between the self‑built data center and public‑cloud facilities, expanding bandwidth tenfold.

2.2 Key Business Development & Architecture Optimization

2.2.1 TV‑Web Interaction

The event featured seven real‑time interactive sessions, similar to Baidu's Spring Festival red‑packet, but with higher complexity because of live TV integration and a larger prize pool. The system had to support up to 5 million concurrent participants.

To achieve this, the team:

Built a high‑availability dual‑data‑center architecture with automatic failover.

Implemented DDOS‑resistant domains for rapid traffic switching.

Sharded red‑packet data across multiple clusters to increase TCP connection capacity and avoid single‑point failures.

Red‑packet flow was redesigned: each server stored the packet files locally, fetched number segments from a central Redis asynchronously, and released connections immediately after the packet was claimed, dramatically reducing Redis load.

2.2.2 Registration & Login

These interfaces were classified into three priority levels:

Level 1: Login/registration (highest priority).

Level 2: User token decryption.

Level 3: Other user‑info queries.

Architecture included Web, Redis, and Kafka clusters. Web handled external requests, Redis cached login data, and Kafka pre‑heated registration data as a “seed queue”. Both hot and cold clusters were deployed for rapid failover.

Level 1 login used high‑availability Web clusters and hot/cold Redis caches to keep response times under 10 ms even at 30× normal QPS. Level 2 token decryption employed feature‑toggle degradation and scaling to keep 99th‑percentile latency below 10 ms. Level 3 interfaces were scaled to keep latency under 200 ms and used a custom API gateway for rate‑limiting and circuit‑breaking.

2.2.3 VR Car Expo

The 20‑day online VR expo required sustained high availability. Autohome built a three‑layer heterogeneous framework:

Layer 1: Hybrid cloud + multi‑CDN supporting 10 million QPS with API static‑caching and active degradation.

Layer 2: Application gateway handling millions of QPS, built on Vert.x, each VM handling 25 k QPS.

Layer 3: API cluster supporting 100 k QPS, using JVM‑level and Redis caches for high‑frequency APIs and Redis for low‑frequency ones.

Custom CDN pre‑fetching automatically pushed large VR assets to CDN nodes, reducing origin pull pressure.

2.2.4 Core Software Optimizations

To handle an estimated 30× traffic surge, the team isolated resources (LB, SCS, RabbitMQ, Redis, OpenStack VMs) per business line, applied rate‑limiting, degradation, and circuit‑breaking policies, and achieved 99% cache hit rates with multi‑level CDN, SCS, and Redis caching. Database traffic was split using master‑slave replication.

Monitoring dashboards aggregated QPS, RTT, and other metrics via Kafka, enabling real‑time anomaly detection.

2.3 Full‑Link Stress Testing

Autohome's internal SaaS pressure‑testing platform simulated massive user traffic. Six rounds of twelve tests covered 170+ applications and 1,000+ interfaces, reducing the failure rate from 71% to 4% after remediation.

Public‑cloud testing was performed on Alibaba Cloud, which offered nationwide traffic injection but incurred higher cost and required manual intervention. Recommendations included limiting public‑cloud tests to scenarios that passed internal validation and using CDN to distribute test traffic globally.

2.4 Monitoring & Security

Eight‑dimensional monitoring (overview, CDN, dedicated lines, public‑exit, LVS, Nginx, containers, middleware) was displayed on a central operations screen. DNS routing directed users to the nearest CDN node, which then routed to IDC or public‑cloud back‑ends.

Security teams performed risk assessments across six business lines, deployed DDOS protection for five main domains, and recorded over 250,000 alerts and 80,000 blocks during the event, preventing any security incidents.

2.5 Emergency & Disaster Recovery

Components (L4‑L7 proxies, caches, MQ, databases, storage) were tuned and stress‑tested to determine performance limits. A fast‑response rate‑limiting tool allowed per‑domain, per‑IP, and per‑Nginx‑cluster throttling to protect critical services during traffic spikes.

3. Summary & Outlook

The 818 event transformed Autohome's technical guarantee system, establishing a reusable large‑event architecture, hybrid‑cloud deployment experience, self‑developed pressure‑testing platform, and a monitoring ecosystem comparable to top‑tier internet companies. Autohome plans to evolve 818 into a global automotive IP, extending the platform to overseas users and positioning the event as an industry‑wide cultural and marketing hub.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringcloud computingScalabilityhigh availabilityPerformance Testingindustry case study
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.