Big Data 18 min read

Design and Optimization of 58.com’s HBase Platform: Multi‑Tenant Support, Data Access Interfaces, Import/Export Tools, and Performance Tuning

This article details the architecture and operational enhancements of 58.com’s HBase platform, covering multi‑tenant resource isolation, various data access APIs, bulk import/export mechanisms, and a series of performance optimizations that improve stability and scalability for massive data workloads.

58 Tech
58 Tech
58 Tech
Design and Optimization of 58.com’s HBase Platform: Multi‑Tenant Support, Data Access Interfaces, Import/Export Tools, and Performance Tuning

HBase is a distributed, column‑oriented key‑value store built on Hadoop that provides high‑reliability and high‑performance services for real‑time read/write and random access to massive datasets, and it is widely used in the big‑data domain.

At 58.com, HBase underpins core business data such as posts, user profiles, search, recommendation, time‑series, and graph data. The platform architecture includes multi‑tenant support, data read/write interfaces, import/export tools, OLAP integration, time‑series and graph databases, SQL‑on‑HBase, and a monitoring system.

1. HBase Multi‑Tenant Support

Resource limits: SCF Quota and HBase Quota.

Resource isolation: RPC read/write separation, ACL permission isolation, RSGroup physical isolation.

SCF Quota limits per‑application request rates, while HBase Quota can restrict request volume and count at user, table, or namespace levels. Quota metadata is stored in the system table hbase:quota. When excessive read/write traffic caused RegionServer (RS) bandwidth saturation, HBase Quota was used to cap table‑level data volume.

ACLs are configured in hbase-site.xml by adding the org.apache.hadoop.hbase.security.access.AccessController coprocessor. Permissions (R, W, X, C, A) can be granted at various scopes. Example for a new user:

create_namespace 'zhangsan'
grant 'zhangsan','RWCA','@zhangsan'

RSGroup provides physical isolation by grouping RegionServers and tables into distinct groups, preventing interference between critical and resource‑heavy workloads.

2. Data Read/Write Interfaces

SCF Proxy: a custom RPC framework that wraps the native Java API, exposing ~30 methods and supporting Java, Python, PHP, etc.

Java native API: used by legacy services.

Thrift Server: enables non‑Java language access, though most users now prefer SCF Proxy.

The SCF Proxy architecture routes client RPC calls through a service‑management platform, which performs service discovery, load balancing, and application‑level rate limiting, reducing direct Zookeeper pressure and improving stability.

3. Data Import/Export

BulkLoad : fast ingestion of pre‑generated HFiles (hundreds of GB to multiple TB) for historical or batch data.

SnapshotScanMR : leverages HBase snapshots to scan data directly from HDFS, bypassing RegionServers, offering higher throughput and less impact on other workloads compared to TableScanMR.

SnapshotScanMR is preferred for full‑table scans and bulk export because it avoids RS contention and reduces network serialization overhead.

4. Platform Optimizations

CLOSE_WAIT reduction : closed sockets on DataNode side were not released by RS; fixing required HDFS‑7694 integration (HBASE‑9393).

Disk I/O bottleneck : a failing DataNode disk caused RS handler threads to block on WAL writes; resolved by RPC read/write separation and pipeline recovery improvements.

Compact lock contention : compact operations held Region read locks, blocking BulkLoad writes; addressed by disabling read‑lock acquisition during compaction (HBASE‑14575).

HTablePool replacement : deprecated HTablePool caused thread‑pool explosion and GC pressure; switched to the recommended Connection‑based Table acquisition.

Additional tweaks: BuckCache for BlockCache, compact rate limiting, etc.

Conclusion

The article presented the four key aspects of 58.com’s HBase platform—multi‑tenant support, data access interfaces, import/export capabilities, and performance optimizations—demonstrating how a large‑scale production environment can evolve an open‑source storage system into a robust, feature‑rich data service.

Finally, 58 Group’s Data Platform Department is recruiting big‑data development engineers for storage, compute, OLAP, messaging, and resource‑management tracks, requiring expertise in technologies such as HDFS, HBase, Spark, Flink, Druid, Kafka, YARN, and related source‑code level optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

HBasemulti-tenantdata import
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.