Scaling Hive Metadata Storage with Federation Architecture
Didi solved Hive’s MySQL metadata bottleneck by building a federation architecture—using waggle_dance to route requests to multiple MySQL instances based on database names—enabling horizontal scaling, read/write support, and seamless compatibility with existing Hive clients while improving stability and performance.
This article discusses Didi's solution to address MySQL query pressure caused by Hive metadata storage scaling. The team implemented a federation architecture using waggle_dance to distribute metadata across multiple MySQL environments, improving Hive's stability and scalability.
The solution involves routing Hive metadata requests to appropriate MySQL instances based on database names, allowing horizontal scaling without modifying Hive Metastore interfaces. Key components include a router service, configuration management, and monitoring systems. The architecture supports read/write operations across multiple metastores while maintaining compatibility with existing Hive clients.
Deployment includes a LVS-based waggle_dance cluster with 4 instances, gradual migration of metadata to new MySQL environments, and plans for table-level routing enhancements. The implementation has been stable for several months, effectively resolving single MySQL bottleneck issues.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.