Backend Development 20 min read

WhatsApp Architecture: High‑Scalability Design and Engineering Insights

The article analyzes WhatsApp's high‑scalability backend architecture, detailing its Erlang‑based server stack, massive user statistics, hardware deployment, custom protocols, performance‑tuning tools, and lessons learned from scaling to billions of messages with a tiny engineering team.

Architect

Jun 26, 2016

WhatsApp was acquired by Facebook for $19 billion, yet its service runs on only about 32 engineers, handling 4.5 billion active users with a highly optimized backend architecture primarily built on Erlang and FreeBSD.

Statistics

5 × 10⁸ active users, 500 billion messages per day across seven platforms.

32 engineers supporting ~14 million users each.

Hundreds of servers, >11 000 CPU cores, several hundred terabytes of RAM.

Peak concurrent connections reaching 4.7 billion, with message peaks of 20 k inbound and 712 k outbound per second.

Platform

Backend: Erlang (custom‑patched BEAM), FreeBSD 9.2, Yaws/lighttpd, PHP, Mnesia, ejabberd (heavily modified), custom XMPP protocol.

Frontend: iPhone, Android, BlackBerry, Nokia Symbian, Nokia S40, Windows Phone, plus an unknown client; SQLite on devices.

Hardware

~550 servers total: ~150 chat servers, ~250 MMS servers.

Each node with 64 GB–512 GB RAM, SSD for most data, >11 000 CPU cores.

General Design

Core messaging implemented entirely in Erlang; ejabberd used as the initial XMPP server and later heavily rewritten.

Messages are queued on the server until the client retrieves them; SSL sockets protect traffic.

Registration relies on phone numbers and a PIN‑based verification flow.

Multimedia is uploaded via HTTP to SSD (images) or SATA (audio/video) and referenced by URLs.

Scaling to 2 million Connections per Server

Initial load: 200 k concurrent connections per server, later increased to 1 million and finally 2 million.

Dynamic capacity planning, load‑spike handling (e.g., major sports events), and fault isolation were essential.

Tools & Techniques for Extensibility

Custom system activity reporter (wsar) collects OS, hardware, and BEAM metrics at sub‑second intervals.

Hardware performance counters (pmcstat) to measure emulator CPU usage.

DTrace, kernel lock counters, fprof for debugging.

Patch‑level BEAM modifications, custom schedulers, memory allocators (mseg), and real‑time priority for the VM.

Extensive instrumentation of message queues, lock contention, and network stack.

Experience Summary

Continuous measurement, bottleneck elimination, and iterative testing are vital for scaling.

Erlang proved to be a powerful platform for high‑availability, high‑throughput services, despite the need for extensive tuning.

Keeping server count low while providing redundancy, and focusing on user‑centric simplicity, contributed to WhatsApp’s success.

References: HighScalability blog (2014), Zhihu answer, Quora discussion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Engineering Performance Optimization Scalable Architecture Erlang WhatsApp

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.