Inside Toutiao’s Massive Scale: How the News App Handles Billions of Requests

This article provides an in‑depth technical overview of Toutiao’s rapid growth, data collection pipelines, user modeling, cold‑start strategies, recommendation engine architecture, storage solutions, push notification system, microservice design, and its three‑layer PaaS platform, illustrating how the news app serves hundreds of millions of users daily.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Inside Toutiao’s Massive Scale: How the News App Handles Billions of Requests

Product Background

Toutiao, launched in March 2012, expanded from a few engineers to over 200 staff and now offers products such as Jinri Toutiao, Jinri Teshou, Jinri Dianying, etc. It serves 500 million registered users, 48 million daily active users, and generates 5 billion page views per day.

Data Collection & Article Processing

Each day the platform crawls roughly 10 k original news articles from various sites, plus novels, blogs, and other content. A manual review filters sensitive material. Text analysis extracts categories, tags, topics, regional relevance, popularity, and weight.

User Modeling

Real‑time logs are ingested with Scribe, Flume, and Kafka. User interests are learned using Hadoop and Storm, and the resulting models are stored in MySQL/MongoDB (read‑write split) and cached in Memcached/Redis. By 2015 the cluster comprised about 7 000 machines.

User subscriptions

Interest tags

Partial article push

Cold‑Start for New Users

When a new user registers, Toutiao identifies the device, OS, version, and social accounts (e.g., Weibo). It builds an initial profile from friends, followers, and their activity, as well as installed apps, device model, and browser bookmarks.

Recommendation Engine

The core of Toutiao’s architecture is a recommendation system with two layers:

Automatic Recommendation

Candidate generation

User matching (location, extracted attributes)

Push task creation

This layer requires ultra‑high‑throughput push to billions of users.

Semi‑Automatic Recommendation

Candidate selection based on in‑app and out‑of‑app actions

Data Storage

Persistent storage uses MySQL or MongoDB together with Memcached/Redis, often on large in‑memory instances and SSDs. Images are stored in the database and served via CDN.

Message Push

Push notifications increase DAU by ~20 %; without push, DAU drops ~10 %. Metrics tracked include click‑through rate, click volume, app uninstall, and push disable counts. Push content is personalized by frequency, content, region, and interests.

System Architecture Overview

Key components include Kafka as the message bus, ETL pipelines, and data warehouses supporting batch, MPP, and cube query engines.

Microservice Architecture

Toutiao decomposes monolithic applications into smaller services, sharing a common infrastructure layer for rapid iteration, fault tolerance, and resource abstraction. The platform runs on a three‑layer PaaS model: an IaaS layer at the bottom, a unified SaaS layer, and an app execution engine.

Conclusion

The platform’s success hinges on massive data generation and collection, real‑time user modeling, a hybrid recommendation engine, scalable storage, and a flexible microservice‑based infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System ArchitectureBig Datadata pipelinenews recommendationToutiaoindustry insight
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.