Databases 14 min read

Design and Evolution of a Custom Storage Engine for IoT Device Metadata

This article presents a detailed case study of an IoT device metadata management platform, describing the business scenario, the evolution from a single‑node MySQL solution through sharded MySQL, HBase and Elasticsearch, to a self‑developed distributed storage engine that separates compute and storage, supports LSM, multi‑dimensional indexing, routing keys, and parallel scans to meet massive write‑read throughput and complex query requirements.

DataFunSummit
DataFunSummit
DataFunSummit
Design and Evolution of a Custom Storage Engine for IoT Device Metadata

The talk begins by introducing an IoT device metadata platform that must ingest diverse device data (e.g., automotive terminals, vending machines) and store basic information such as device ID, type, status, and optional attributes like geographic coordinates. The core requirement is a unified, scalable storage layer for massive, frequently updated metadata.

The initial architecture used a single MySQL instance, which sufficed for tens of thousands of active devices and a few hundred QPS. As device count grew toward billions and peak QPS reached 10k+, MySQL could no longer meet performance and operational needs, prompting a move to MySQL sharding.

Sharding improved capacity but introduced operational complexity and still struggled with complex queries and high‑frequency updates. The next generation combined MySQL with Elasticsearch (ES) for advanced search, and later replaced sharded MySQL with HBase, a distributed NoSQL store with LSM indexing. HBase offered better write scalability, while ES provided powerful full‑text and multi‑field queries.

Although HBase + ES covered many requirements, challenges remained: high operational overhead, lack of mature data sync between HBase and ES, and the need for unified query interfaces. To address this, a custom storage engine was built, leveraging a distributed file system (DFS) for compute‑storage separation, supporting both range and hash partitioning, primary storage plus index nodes, read‑only replicas, and native SQL/API access.

The custom engine incorporates LSM storage for high write concurrency, CDC for data subscription, automatic partition expansion, and multi‑type indexes (secondary, inverted, spatial). It introduces a routing‑key concept to limit query fan‑out, and a ParallelScan interface that creates a query context ID, allowing the system to maintain state across file accesses and dramatically reduce CPU and I/O overhead.

Performance tests show the optimized design achieves near‑million‑operations‑per‑second throughput while keeping latency low. The final comparison highlights that the self‑developed engine simplifies architecture, supports both simple and complex queries, and meets the high‑concurrency demands of IoT metadata workloads.

Future work includes multi‑tier storage (SSD, HDD, OSS), distributed OLAP extensions on top of existing SQL capabilities, and stronger compute support within the storage engine.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBig DataStorage EnginedatabasesIoT
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.