How FunData Scaled DOTA2 Esports Data with a Cloud‑Native Big Data Architecture
This article details the evolution of the FunData esports data platform from a simple master‑slave ETL system to a cloud‑native, distributed architecture that leverages Google Cloud Pub/Sub, Dataflow, Bigtable, and a redesigned API layer to handle petabyte‑scale, real‑time DOTA2 match data.
1.0 Architecture
The initial FunData system followed an MVP approach with a two‑module master‑slave design. The Master periodically called the Steam API for match IDs, dispatched analysis tasks via an in‑memory message queue, and tracked progress. The Slave listened to the queue, performed replay analysis using the open‑source projects Clarity and Manta, and stored results.
While stable at launch, the system soon faced scalability and maintainability problems: rebuilding DB indexes for new fields took hours, the tightly coupled master‑slave relationship required full restarts, there was no message persistence, scaling slaves required manual VM image creation, and the master‑slave DB schema caused lock contention.
2.0 Architecture
Learning from the 1.0 shortcomings, the 2.0 redesign focuses on three core qualities: fine‑grained, high‑concurrency task processing, distributed storage, and system decoupling.
Task granularity: each match generates up to 1.2 million DOTA2 games per day; tasks are split into multiple Pub/Sub topics and processed by independent workers.
Distributed storage: Google Cloud Bigtable stores raw and processed data, while MongoDB holds aggregated statistics.
Decoupling: Pub/Sub (Kafka‑like) replaces the in‑memory queue, allowing independent restarts and horizontal scaling.
Data Processing Flow
Basic data (match details, KDA, damage, creep score, etc.) and replay analysis results are fetched by a Supervisor, cleaned by workers, and written to Bigtable. High‑level statistics (hero usage, item builds, team fights) are produced by Dataflow pipelines and stored in both MongoDB and Bigtable.
The original single Slave node is split into four sub‑modules: league data analysis, league replay analysis, DB proxy for analysis/ mining data, and monitoring.
Distributed Storage Choice
MySQL proved inadequate for the growing data volume and schema evolution. The team adopted Google Cloud Bigtable (and HBase concepts) for its scalable, low‑latency random reads/writes. RowKey design combines a consistent hash prefix with the match_id to avoid hotspotting and enable effective sharding.
Secondary indexing is built in MySQL: workers write a timestamp‑RowKey index to MySQL, which is later used for range queries.
System Decoupling
Replacing the in‑memory queue with Pub/Sub eliminates data loss on Master failures and enables independent version upgrades. The message bus also provides visual monitoring of backlog and supports multi‑cloud resilience.
API Layer Redesign
The original API layer used DreamFactory on Alibaba Cloud, exposing full‑table REST endpoints without caching, leading to latency spikes and cross‑region latency. The new design splits APIs by data domain (matches, league schedules, heroes, items) and introduces CDN acceleration, multi‑cloud failover, and an internal cache that refreshes on data updates.
Conclusion
The FunData platform evolved from a monolithic master‑slave ETL pipeline to a cloud‑native, distributed system that can ingest, process, and serve petabyte‑scale esports data with low latency and high availability. Since its public launch on April 10, over 300 developers have obtained API keys, and the team continues to add new data points such as league statistics and real‑time match feeds.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
