Inside Tencent Analytics: How TA Handles TB‑Scale Real‑Time Web Data
Tencent Analytics (TA) is a free web analytics platform that processes terabytes of daily data in real time, using a custom architecture featuring JavaScript collection, event streaming, in‑memory computation, and NoSQL storage with Redis and LevelDB, offering site owners instant insights and high availability.
Basic Principles and System Architecture
TA collects user behavior data via a JavaScript snippet embedded on the site, sends it to a collection cluster where it is filtered, encoded, and formatted, then forwards it to a processing cluster that computes business metrics and writes results to a storage cluster for presentation.
TA Backend Components
Http Access : parses HTTP protocol, cleans and formats data.
ESC (Event Streaming Coder) : encodes non‑enumerable data types into integers and persists the mapping.
ESP (Event Streaming Processor) : reorganizes data by site and UID, calculates PV, UV, dwell time, bounce rate, etc.
ESA (Event Streaming Aggregator) : aggregates ESP results per site and writes them to Redis.
Center : central node for configuration, data routing, and disaster‑recovery switching.
Logserver : writes raw Access data to files and uploads them to TDCP.
TDCP (Tencent Distributed Computing Platform) : performs offline calculations and writes results to MySQL.
Real‑Time Solution
TA must handle TB‑scale daily data from hundreds of thousands of sites, with billions of unique URLs and over a billion stored keys. The solution relies on full binary data, in‑memory computation, and NoSQL storage.
Real‑Time Computation
The computation subsystem draws ideas from Hadoop, S4, and Storm to build a generic, highly extensible, fully in‑memory event‑processing system.
Key design points:
Data Organization : all non‑int types are converted to ints; enumerated types use a configured mapping, non‑enumerated types use MD5 to generate a unique int.
Protocol : a flexible Event structure with semi‑automatic serialization/deserialization (inspired by msgpack) and compact binary encoding (Zigzag, similar to Protobuf).
Incremental Computation Model : consists of Processor (business logic), Data Holder (stores intermediate results), and Emitter (periodically outputs and clears results).
Processing flow:
Receive Event and compute in Processor.
Save results and intermediate data in Data Holder.
Emitter triggers periodic output of the time‑slice results and clears them.
This model reduces per‑machine transaction state, simplifying distributed implementation and boosting performance.
Real‑Time Storage
Real‑time storage serves statistics displayed on the web front‑end. It has two typical characteristics:
Frequent updates: each statistic can be refreshed as fast as once per second.
Read pattern: relatively small reads compared to the massive write volume, divided into fixed (e.g., URLs, keywords) and dynamic (e.g., per‑site PV/UV) data.
To meet these needs, TA uses a combination of Redis and LevelDB.
Redis
Redis is the primary real‑time storage component. It provides rich data structures (hashes, sets, etc.) suitable for both static and dynamic metrics. Custom extensions to commands such as sort, hmget, and a new hmincrby enable arithmetic operations and batch updates, cutting CPU usage by nearly 50% and doubling throughput for the ESA module.
LevelDB
LevelDB complements Redis by persisting immutable data (e.g., URLs, keywords) on disk. Its high write performance and sufficient read speed make it ideal for the “fixed” data set. TA employs dual‑write replication and sharding by domain, with dynamic load‑based rebalancing without moving data.
Summary
TA achieves second‑level data updates with a hybrid architecture that combines in‑memory real‑time computation, event‑driven pipelines, and a NoSQL storage layer built on Redis and LevelDB, providing site owners with immediate, high‑availability analytics.
Author: Chu Dapeng Source: Tencent Big Data (https://www.qcloud.com/community/article/262)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
