Big Data 10 min read

Design and Challenges of Tencent iData Analysis Center Backend: Bitmap Storage and MapReduce Architecture

Tencent’s iData Analysis Center rebuilt its backend as TGMars, replacing a rigid row‑oriented bitmap store and single‑node MapReduce pipeline with a more extensible architecture that shards user behavior bitmaps, eliminates shuffle overhead, and adds columnar storage, iterative processing and SQL‑like capabilities using Spark to overcome scalability and flexibility limitations.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Design and Challenges of Tencent iData Analysis Center Backend: Bitmap Storage and MapReduce Architecture

Tencent iData Analysis Center provides number‑package extraction, profiling, and work analysis. After years of operation, the team identified several shortcomings in the legacy system—poor customizability, complex architecture, unstable modules, and difficult operations—prompting the development of a new backend called TGMars.

The new design focuses on two core problems: why the old system was fast and why it was hard to extend.

1. Bitmap Storage

The system uses bitmap structures to record basic user behaviors (registration, login, recharge, consumption). Each user is represented by a fixed‑length 396‑bit array: the first 12 bits capture monthly features, the next 368 bits capture daily behaviors, and the remaining bits are reserved for alignment. Bitmaps allow a simple 0/1 representation of whether a behavior occurred, and the ordered bits naturally encode the time dimension.

Users are sharded based on a deterministic rule (numeric IDs modulo shard count or CityHash for strings), ensuring an even distribution across shards. This design enables fast, locality‑aware joins without full‑table scans.

2. Row‑Based Storage Limitations

While bitmap files are efficient, their row‑oriented storage is inflexible: file size is fixed, shard layout cannot be changed after creation, and extending dimensions requires redesign. Row storage suits OLTP workloads but is suboptimal for OLAP scenarios like iData, which demand large‑scale analytical queries with modest real‑time requirements.

3. MapReduce Architecture

The computation follows a MapReduce pattern. The Map phase shards number packages and aggregates bitmap data locally, avoiding any Shuffle step and thus minimizing network overhead. The Reduce phase merges results back into number packages or writes them to a database. This tight coupling of storage and computation explains the system’s speed.

However, the Reduce step becomes a bottleneck because it runs on a single node, leading to failures and limited scalability. Moreover, the architecture only supports a single MapReduce pass, restricting complex SQL‑like joins or iterative processing.

4. Rethinking the Platform

The team proposes a next‑generation platform with two main goals: extensible storage structures and customizable computation logic. Desired improvements include columnar storage, definable dimensions, support for iterative models, and native SQL capabilities.

Although existing big‑data platforms (TiDB, Spark, ClickHouse, Palo) meet many requirements, the team continues to build on Spark while adding custom features to address their specific needs.

The article concludes with a call to follow the Cloud+ community for future updates.

system architecturebig dataOLAPMapReducebitmap storage
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.