Big Data 14 min read

Challenges and Optimizations of Hive MetaStore at Kuaishou

This article details how Kuaishou tackled performance, scalability, and stability challenges of Hive MetaStore by introducing a BeaconServer hook architecture, read‑write separation, API refinements, traffic control, and federation designs, resulting in significant query efficiency and service reliability improvements.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Challenges and Optimizations of Hive MetaStore at Kuaishou

Kuaishou built a data warehouse on Hive, storing Hive metadata in MySQL; rapid business growth created performance and stability challenges for Hive and its MetaStore.

Challenges include high query performance demand, usability, extensibility, low cost, massive read traffic (>500k queries/day), extensive dynamic partition usage (millions of partitions), and rapid partition growth.

The solution architecture introduces BeaconServer as a stateless Hook service integrated with HiveServer2 to route SQL to efficient engines (Presto, Spark) and provide auditing, rewriting, and error analysis.

Optimization 1 – Read/Write Separation: Read requests are routed to replicas after ensuring GTID synchronization, reducing master QPS by over 70% and improving stability.

Optimization 2 – MetaStore API Improvements: Skip unnecessary partition scans in DESC, upgrade Spark Hive client to reduce redundant get_functions calls, batch large data scans, add partition name index, and enforce string conversion for partition filters, achieving significant latency reductions.

Optimization 3 – Traffic Control: BeaconServer enforces priority‑based throttling during peaks, protecting MetaStore from overload and ensuring high‑priority requests are served.

Optimization 4 – Federation: Routing is implemented at the HMSHandler level, mapping Hive DBs to different RawStore implementations without modifying the persistence layer, enabling horizontal scaling.

These optimizations collectively enhanced query efficiency, stability, and scalability of Hive MetaStore services at Kuaishou.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

traffic controlhiveRead-Write SeparationFederation
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.