Operations 7 min read

Inside Stack Overflow’s Redundant Architecture: How It Scales to 170 Million Daily Visits

This article dissects Stack Overflow’s end‑to‑end architecture—covering its dual‑data‑center redundancy, physical and logical server layout, load balancing, web and service tiers, caching strategy, push system, search cluster, database design, and monitoring—showcasing how the platform achieves massive scalability and high availability.

Efficient Ops
Efficient Ops
Efficient Ops
Inside Stack Overflow’s Redundant Architecture: How It Scales to 170 Million Daily Visits

Architecture Overview

Stack Overflow, the renowned programming Q&A community founded by Jeff Atwood and Joel Spolsky in 2008, ranks among the top‑global sites with over 170 million daily page views. Its architecture combines outsourced services and extensive open‑source components, and can be broken down into eight key layers:

Internet

Load Balancing

Web Tier

Service Tier

Cache

Push

Search

Database

Architecture diagram:

Architecture Principles

Everything is redundant. All critical components are duplicated across two data centers (New York and Colorado) with continuous backup.

Physical Architecture

4 Microsoft SQL Server instances (2 on new hardware)

11 IIS web servers (new hardware)

2 Redis servers (new hardware)

3 tag‑engine servers (2 on new hardware)

3 Elasticsearch nodes (new hardware)

4 HAProxy load‑balancers (2 added for CloudFlare support)

2 network devices (Nexus 5596 core + 2232TM Fabric Extender, upgraded to 10 Gbps)

2 Fortinet 800C firewalls (replacing Cisco ASA)

2 Cisco ASR‑1001 routers (replacing Cisco 3945)

2 Cisco ASR‑1001‑x routers

Logical Architecture

The Internet

DNS services: outsourced to CloudFlare plus an in‑house DNS server for added safety.

Load Balancers

HAProxy 1.5.15 on CentOS 7, handling TLS traffic; upcoming HAProxy 1.7 will add HTTP/2 support.

Web Tier

IIS 8.5, ASP.NET MVC 5.2.3, .NET 4.6.1.

Service Tier

IIS, ASP.NET MVC 5.2.3, .NET 4.6.1, and HTTP.SYS.

Cache

Redis is used for L2 caching; L1 consists of HTTP cache. If both miss, the database is queried and the result populates both caches. Cache invalidation follows a publish/subscribe model to keep web‑server caches consistent. Redis CPU usage stays below 2 %.

Push

Open‑source library NetGrain uses WebSockets to push real‑time updates (notifications, vote counts, new navigation items, answers, comments). At peak, about 500 k concurrent WebSocket connections are maintained, some lasting over 18 months.

Search

Elasticsearch cluster with three nodes per cluster; Solr was not chosen because it lacked multi‑index support and required a major re‑index for version 2.x upgrades.

Database

SQL Server is used with a deliberately simple schema—only one stored procedure, slated for removal in favor of pure code.

Monitoring System

Opserver, a lightweight monitoring tool built on ASP.NET MVC, tracks:

Servers

SQL clusters/instances

Redis

Elasticsearch

Exception logs

HAProxy

monitoringsystem architectureHigh AvailabilityLoad Balancingcaching
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.