Big Data 15 min read

Hive MetaStore Challenges and Optimizations at Kuaishou

At Kuaishou, the Hive MetaStore service, which stores metadata for Hive, faced scalability and performance challenges due to massive dynamic partitions and high query volume, leading to a series of architectural optimizations—including read‑write separation, API enhancements, traffic control, and federation—to improve stability and efficiency.

DataFunTalk

Mar 10, 2021

Hive MetaStore Challenges and Optimizations at Kuaishou

Kuaishou builds its data warehouse on Apache Hive, storing Hive metadata in MySQL. Rapid business growth and exploding data volumes created four major challenges for the Hive MetaStore service: high performance demands, usability across multiple engines, extensibility for future engines, and low‑cost operation.

To address these, Kuaishou designed an intelligent SQL‑on‑Hadoop architecture centered on a BeaconServer hook that routes queries to appropriate engines, provides auditing, SQL rewriting, error analysis, and optimization suggestions, while remaining stateless and horizontally scalable.

1. MetaStore Read‑Write Separation – Read‑only services are directed to replica databases, reducing primary QPS by over 70%. Consistency is ensured by comparing GTIDs before routing reads to replicas.

2. MetaStore API Optimizations – Redundant API calls (e.g., get_functions) were eliminated by upgrading Spark’s Hive client; DESC TABLE now skips exhaustive partition scans, cutting execution time from >200 s to 0.2 s; large‑batch queries are broken into smaller batches; partition‑key filtering is accelerated with indexes and type‑casting fixes, yielding up to 50× speedups.

3. MetaStore Traffic Control – A BeaconServer‑based control layer dynamically applies rate‑limiting policies based on request priority, protecting the service during peak loads and ensuring high‑priority queries remain responsive.

4. MetaStore Federation – To overcome MySQL single‑node limits, a federation layer routes metadata requests to multiple RawStore back‑ends based on Hive DB names, providing horizontal scalability without invasive changes to Hive core code.

The combined optimizations dramatically improved query efficiency, reduced latency, and enhanced the stability of Kuaishou’s Hive‑based data platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Optimization Big Data Hive Kuaishou MetaStore SQL on Hadoop

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.