Big Data 7 min read

Scaling Hive Metadata Storage with Federation Architecture

Didi solved Hive’s MySQL metadata bottleneck by building a federation architecture—using waggle_dance to route requests to multiple MySQL instances based on database names—enabling horizontal scaling, read/write support, and seamless compatibility with existing Hive clients while improving stability and performance.

Didi Tech
Didi Tech
Didi Tech
Scaling Hive Metadata Storage with Federation Architecture

This article discusses Didi's solution to address MySQL query pressure caused by Hive metadata storage scaling. The team implemented a federation architecture using waggle_dance to distribute metadata across multiple MySQL environments, improving Hive's stability and scalability.

The solution involves routing Hive metadata requests to appropriate MySQL instances based on database names, allowing horizontal scaling without modifying Hive Metastore interfaces. Key components include a router service, configuration management, and monitoring systems. The architecture supports read/write operations across multiple metastores while maintaining compatibility with existing Hive clients.

Deployment includes a LVS-based waggle_dance cluster with 4 instances, gradual migration of metadata to new MySQL environments, and plans for table-level routing enhancements. The implementation has been stable for several months, effectively resolving single MySQL bottleneck issues.

distributed systemsData Warehousebig data architectureHive FederationMetadata ScalabilityMySQL OptimizationWaggle Dance
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.