Deep Dive into OpenMLDB Architecture: Millisecond‑Level Real‑Time Feature Computation Engine
This article provides a comprehensive technical overview of OpenMLDB, covering its overall architecture, online real‑time SQL execution and storage engines, core data structures, pre‑aggregation techniques, and performance test results that demonstrate millisecond‑level latency for feature computation.
OpenMLDB is a production‑grade feature platform that ensures consistency between offline batch processing and online real‑time inference, allowing users to write feature logic in SQL and execute it seamlessly in both environments.
The system consists of three main components: a ZooKeeper service for metadata and service discovery, a Nameserver that manages Tablet nodes and handles failover, and Tablets that host both the SQL Engine and the Storage Engine. Tablets are the smallest deployable units and are responsible for query execution and data storage.
Online query processing follows a clear flow: the SDK connects to ZooKeeper to obtain Nameserver and Tablet metadata, routes SQL requests to the appropriate Tablet, the Tablet parses the SQL, generates a distributed execution plan, may dispatch sub‑tasks to other Tablets, interacts with the storage engine, aggregates results, and finally returns the computed feature values to the client.
The SQL Engine parses, validates, and optimizes SQL statements, generates LLVM IR, compiles it to native code, and executes the physical plan. It provides hundreds of built‑in functions and supports user‑defined functions (UDFs) written in C++. Optimizations such as loop binding and window merging reduce data scans and improve throughput.
OpenMLDB offers two storage back‑ends: an in‑memory engine with ultra‑low latency and a RocksDB‑based on‑disk engine that reduces cost. The in‑memory engine uses a two‑level skip‑list (jump list) where the first level indexes by key and the second level indexes by timestamp, enabling fast range queries.
The on‑disk engine stores data in RocksDB column families, each representing an index. Keys are composed of the primary key and timestamp, guaranteeing sorted order for efficient time‑range scans.
Data is encoded with a compact binary format that includes version fields, a bitmap for NULL detection, and separate sections for primitive types, string addresses, and string payloads, allowing fast serialization and deserialization.
Data is sharded across Tablets with multiple replicas for high availability. Each shard can be migrated between Tablets to balance load, and leader/follower roles ensure consistent reads and writes.
OpenMLDB implements a pre‑aggregation mechanism for long‑window queries: data is aggregated in advance into pre‑aggregated tables, dramatically reducing latency for large windows. When a query arrives, the engine can combine pre‑aggregated results with recent data to produce the final answer.
Performance tests show that latency remains under 10 ms for window sizes up to 10 k and that increasing the number of windows or LAST JOIN tables only modestly impacts latency and throughput. Pre‑aggregation further improves long‑window query latency by two orders of magnitude.
The article concludes with a summary of the system’s capabilities and provides resources for further exploration, including the OpenMLDB GitHub repository, official website, contact email, and a WeChat community group.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.