How to Design a Scalable Twitter‑Like Backend from Scratch

This article outlines the functional and non‑functional requirements, traffic calculations, service decomposition, and detailed microservice designs needed to build a highly available, scalable Twitter‑style system using distributed caches, databases, and asynchronous processing.

21CTO
21CTO
21CTO
How to Design a Scalable Twitter‑Like Backend from Scratch
Twitter is one of the largest online social networks. Designing its system architecture from zero requires identifying essential services and early‑stage considerations.

Functional Requirements

Users can post or share new tweets.

Each tweet is limited to 140 characters.

Users can delete tweets (no edit operation).

Users can like tweets.

Users can follow or unfollow other users; following a user makes their tweets appear on the follower's timeline.

Two types of timelines are generated: a user timeline (last N tweets) and a home timeline (popular tweets from followed users, ordered by time).

Users can search tweets by keyword.

Users must have an account to post or read tweets (external identity service is used).

Users can register and delete accounts.

The system supports text‑only tweets in this design.

Analytics/monitoring services to determine load, health, and functionality.

Analytics also provide recommendations such as who to follow, tweet notifications, trending topics, push notifications, and sharing suggestions.

Non‑Functional Requirements

High availability is critical; users should read timelines without noticeable delay.

Timeline generation must not exceed half a second.

Strong consistency is not required; eventual consistency is sufficient, with a keyword database for search.

The system must be scalable as users and tweets grow.

User data must be persisted.

Now we perform some traffic calculations.

Daily active users average requests per second = 150M × 60 / 86400 ≈ 100k req/s.

Peak users = average concurrent users × 3 = 300k.

Maximum peak users in three months = peak users × 2 = 600k.

Read QPS = 300k.

Write QPS = 5k.

Overview of Twitter Services

Due to system complexity, it can be divided into several microservices:

Tweet Service

User Timeline Service

Fanout Service

Home Timeline Service

Social Graph Service

Search Service

1. Tweet Service

Receives user tweets and forwards them to followers' timelines and the search service.

Stores user information, tweet data, tweet counts, and like status.

Composed of application servers, distributed in‑memory caches, and a backend distributed database (or a cache directly backed by a database such as Redis).

The Tweet Service database includes Users, Tweet, and Favorite_tweet tables. Users stores all user information, Tweet stores all tweets, and Favorite_tweet records each like action.

2. Generating Unique Tweet IDs

When a user calls postTweet(), the request reaches an application server, which generates a globally unique ID (or a short URL) for the tweet, possibly using a UUID. The ID is then inserted into both the distributed cache and the Tweet table in the database using a cache‑through (write‑through) strategy.

3. Scalability Design

Distributed cache and database can be sharded and replicated.

4. Social Graph Service

Implements Following API to track follow relationships.

Consists of application servers, distributed cache, and a database.

Database schema for storing user relationships is shown below.

Following API actions:

Asynchronously merge the followed user's timeline into the follower's event stream.

When unfollowing, asynchronously remove the unfollowed user's tweets from the follower's stream.

Asynchronously select tweets from the event stream.

Asynchronous processing provides fast user feedback despite potentially long operations.

5. User Timeline Service

Returns a user's timeline ordered by creation time in descending order; used for both personal and home timelines.

Composed of application servers and a distributed cache; no direct database access.

Timeline is stored as a list of tweet IDs.

When a tweet is posted, the Tweet Service calls the User Timeline Service to prepend the tweet to the user's timeline (O(1) operation).

A configurable parameter K (default 1000) limits the number of stored tweets per timeline.

When the list exceeds K, the oldest entry is evicted.

6. Fanout Service

Forwards new tweets to the Search Service, Home Timeline Service, and other components such as Trending or Notification services.

Implemented as a set of distributed queues.

Handles high‑fan‑out scenarios (e.g., celebrity users) by processing follower lists asynchronously.

Provides eventual consistency through asynchronous task processing.

7. Home Timeline Service

Displays the home timeline, aggregating tweets from followed users in descending order of creation time.

Similar design to the User Timeline Service but includes weighting logic for tweets from many followed users.

Manages insertion of new tweets and eviction of oldest tweets when the size exceeds K.

8. Search Service

Provides keyword‑based search queries for users.

The Fanout Service forwards tweets to the Search Service.

Components:

Ingester: tags tweets with relevant terms, discarding irrelevant words.

Stemming: reduces words to their root forms and stores a lookup table.

Indexing: creates a reverse index mapping terms to tweet documents.

Blender: processes search queries, applies stemming, and runs them against the reverse index.

9. Photo and Video Storage

Media files are stored using a NoSQL database for metadata and a file system for the actual files.

Database tables capture media metadata.

Final Detailed Design

System design diagram:

Data architecture diagram:

References:

1. Twitter System Architecture: https://medium.com/interviewnoodle/twitter-system-architecture-8dafce16aec4
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsMicroservicesScalabilitySystem DesignTwitter
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.