Huya Data Platform: Cost Reduction and SLA Strategies
This article presents Huya's big data platform evolution, detailing cost‑saving measures, SLA practices, multi‑datacenter architecture, containerized resources, metadata‑driven intelligence, and future directions such as hybrid‑engine materialized views to improve efficiency and service reliability.
Introduction The talk focuses on two technical directions of Huya's data platform: reducing costs and improving user service SLA, topics less commonly shared in the industry.
1. Huya Data Platform Development Trajectory
Huya's data platform has progressed through three stages:
2017‑2018: Building the platform to support end‑to‑end data development with emphasis on simplicity and efficiency.
2019‑2021: Scaling data and compute, addressing rising resource costs by increasing YARN utilization above 90%, containerizing all workloads, adopting storage‑efficient Hadoop EC (6+3) format, and achieving significant cost reductions.
2021‑present: After cost‑optimization, focusing on leveraging extensive metadata for intelligent services, launching a “smart” initiative.
2. Positioning of Huya Big Data Platform
The platform serves as a low‑cost, high‑reliability foundation (the "water pipe") while data middle‑platform and business front‑ends add value. Key considerations include:
Defining core value for users and the company.
Understanding the relationship with public cloud and future development under cloud migration.
Cost is addressed by reducing unit price (storage and compute) and controlling usage volume through usage‑based billing and resource optimization. SLA is defined around critical user concerns such as offline task timeliness and online compute availability.
Technical achievements include multi‑datacenter deployment, offline‑online mixed‑placement, full containerization of Hadoop/YARN, elastic compute, and a storage‑separated architecture.
3. Future Exploration Directions
Huya aims to advance intelligent data platform capabilities by exploiting metadata for task operation, compute optimization, storage lifecycle, and data‑warehouse automation. The vision includes moving from table‑centric to view‑centric data models, where the platform decides materialization and storage.
Current work focuses on hybrid‑engine materialized views that automatically optimize queries across engines (e.g., ClickHouse for low‑latency, HBase for high‑concurrency, Doris for joins), providing transparent performance gains to users.
Overall, the platform has achieved up to 75% reduction in offline compute cost and 40% reduction in storage cost through these measures.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.