Big Data 12 min read

How Joydata Scaled to 150 Billion Daily Events with StarRocks: A Data Architecture Journey

Facing daily data growth from millions to 150 billion records, Joydata‑U transformed its analytics platform through three architectural stages—Hadoop, Hadoop + Trino, and finally StarRocks—introducing resource isolation, Flat JSON acceleration, and Bitmap indexing to cut query latency by up to seven times and achieve sub‑2‑minute data freshness across BI, ad‑tech, game analytics, and CRM workloads.

StarRocks

Aug 19, 2025

Joydata‑U, a global game development and publishing company, expanded its services across MMORPG, ACT, and other genres, reaching markets in Southeast Asia, Japan/Korea, the Americas, and Greater China. As the business grew, daily data volume surged from a few million rows to a peak of 150 billion records, demanding higher performance, real‑time capabilities, and stability from the underlying data platform.

Core Business Scenarios

The platform supports four critical modules:

Joydata‑BI reporting : Handles massive internal reports (finance, game design, operations) with P99 query latency under 5 seconds for KPIs such as DAU, new users, retention, and revenue.

Advertising analysis : Processes multi‑platform ad spend data (e.g., Giant Engine, Tencent Ads, Xiaohongshu) amounting to millions of dollars daily, requiring near‑real‑time feedback.

Game user behavior analysis : In MMO titles, item and currency flows constitute 70 % of data; the system must support ad‑hoc queries and deep behavioral analytics.

Customer service (CRM) : Enables detailed queries on paid‑player issues within a 2‑minute ingestion window to ensure prompt support.

Architecture Evolution

Stage 1 – Traditional Hadoop : Data processed via Hive with offline batch jobs, insufficient for growing scale.

Stage 2 – Hadoop + Trino : Integrated third‑party analytics, but as daily volume reached 150 billion rows, performance degraded:

Query times stretched to tens of minutes or hours for annual reports.

Real‑time requirements (5‑10 minute refresh) could not be met.

Resource contention caused critical queries (e.g., customer‑service) to be blocked.

Stage 3 – StarRocks adoption (2020) : After evaluating Doris and ClickHouse, the team selected StarRocks 1.19 as the core query engine. The new pipeline routes business events to Kafka, uses Flink for real‑time cleansing and transformation, and writes the result into StarRocks, achieving end‑to‑end latency under 2 minutes.

Resource Isolation & Scheduling

Using StarRocks Enterprise’s Multi‑Warehouse feature, the team created hard‑isolated compute pools:

Dedicated BI pool for heavy analytical queries.

Separate Flink write pool to guarantee ingestion stability.

Default pool for routine workloads such as ad analysis and customer‑service queries.

Although overall resource utilization decreased, this isolation eliminated query interference and ensured stable performance for critical services.

Flat JSON Optimization

MMO user‑behavior data contains >300 dynamic attributes. The original flexible JSON schema caused full‑JSON scans, dramatically slowing path‑analysis and item‑usage queries. StarRocks 3.3 introduced Flat JSON, automatically extracting high‑frequency fields (e.g., operation time, role level) into native columns while preserving low‑frequency or newly added fields in JSON. This change boosted JSON parsing speed by roughly tenfold without code changes.

Bitmap Index for User Tagging

The team built a hybrid tagging system: static tags stored in a daily‑updated wide table, and dynamic tags created on‑the‑fly for ad‑hoc analysis. Bitmap indexes were created for each tag, enabling fast set operations via functions such as bitmap_and() and bitmap_or(). Frequently used tag combinations were materialized as views, delivering multi‑fold query speedups compared with traditional join‑based approaches.

Performance Gains

Query latency improved 5‑7× across all modules.

BI report generation that previously took tens of minutes now finishes in a few minutes, keeping P99 latency under 5 seconds.

Data freshness achieved sub‑2‑minute latency, supporting real‑time monitoring for customer‑service and ad‑tech.

Future Directions

The roadmap includes exploring lake‑house integration by combining Paimon with StarRocks for hot‑cold data separation, and leveraging AI assistants to interpret BI results, provide conclusions, and enable multi‑turn conversational analysis.

Overall, the migration to StarRocks’s compute‑storage separation architecture, combined with resource isolation, Flat JSON, and Bitmap indexing, delivered a scalable, low‑latency analytics platform that meets Joydata‑U’s demanding global gaming workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink StarRocks Kafka Bitmap Index data architecture Lakehouse Flat JSON

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Core Business Scenarios

Architecture Evolution

Resource Isolation & Scheduling

Flat JSON Optimization

Bitmap Index for User Tagging

Performance Gains

Future Directions

StarRocks

How this landed with the community

Was this worth your time?

0 Comments

Flat JSON Optimization