Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse
This article describes how NetEase Games built a Presto‑based interactive ad‑hoc query platform backed by Alluxio caching to achieve sub‑10‑second query latency, outlines the architectural design, performance comparisons with other Hadoop‑based solutions, encountered issues, and future improvement plans.
NetEase Games, a leading global game developer and publisher, generates roughly 30 TB of raw data daily, which expands to several times that size after ETL processing. While routine business metrics are served by traditional reporting systems, there is a strong demand for low‑latency, ad‑hoc analytical queries that can handle flexible conditions, provide progress feedback, and return results within 2‑15 seconds.
Existing Hadoop‑based ad‑hoc solutions (Hive on MapReduce, Hive on Tez, SparkSQL) suffer from slow query times, resource‑allocation delays, and unstable performance due to heavy I/O and cluster contention. To meet the business requirement of ~10 second average query latency, the team evaluated several options and selected a Presto + Alluxio combined approach.
Presto is an open‑source, MPP‑style distributed SQL engine designed for fast interactive analytics, while Alluxio provides a memory‑level distributed storage layer that caches data from HDFS, reducing data‑read latency and decoupling compute from storage.
The deployed architecture places Presto and Alluxio on an isolated satellite cluster separate from the main HDFS datanode cluster. Coordinators and masters share nodes, and workers are co‑located, scaling up to 100 nodes. Alluxio uses a tiered storage model (MEM + HDD) with 10 GB memory and 800 GB HDD per worker, employing its built‑in eviction policy.
Key operational details include a dedicated Hive metastore instance for Presto queries to avoid metastore contention, one‑way data sync from HDFS to Alluxio (no writes back), and a metadata sync tool that automatically mirrors whitelisted Hive tables to the ad‑hoc metastore within seconds.
Performance tests show that Presto with Alluxio cache outperforms Hive‑MR, Hive‑Tez, and SparkSQL, achieving query latencies well below the 2‑15 second target. Adding Alluxio further reduces latency and stabilizes response times across repeated runs.
During production use, several issues were identified: missing performance‑related Alluxio metrics, occasional RPC timeouts under high concurrency, and metadata scalability limits in the Alluxio master. Planned improvements involve upgrading to Alluxio 2.0, integrating Presto + Alluxio with YARN to improve resource utilization, and expanding Alluxio as a unified file entry point for data ingestion.
Overall, the Presto + Alluxio framework has successfully enabled interactive, low‑latency analytics for NetEase game operations, and the team intends to broaden its deployment across more business scenarios while contributing to the open‑source community.
NetEase Game Operations Platform
The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
