Big Data 17 min read

Applying OpenMLDB for Efficient AI Toolchain and Data‑Driven Architecture at Akulaku

This article presents Akulaku’s practical experience with OpenMLDB, describing the company’s data‑driven requirements, the design of a unified stream‑batch architecture, implementation details across offline, online and RocksDB modes, and future recommendations for high‑performance, scenario‑agnostic big‑data processing.

DataFunTalk

Aug 25, 2022

Applying OpenMLDB for Efficient AI Toolchain and Data‑Driven Architecture at Akulaku

Akulaku, a leading Southeast Asian e‑commerce and digital‑banking platform, faces massive real‑time and offline data processing demands that require high accuracy, low latency, and strong performance. To meet these needs, the company adopted OpenMLDB as the core of its data‑driven AI and BI toolchain.

The architecture consists of a feature‑calculation layer (using third‑party and custom tools) and a model‑calculation layer built on an open, extensible framework that supports diverse intelligent applications such as behavior analysis, geo‑location, device fingerprinting, anti‑money‑laundering, risk control, and AI‑driven customer service.

Key design goals were: (1) a unified stream‑batch solution where the same code and logic serve both OLAP and OLTP workloads, eliminating divergent data pipelines; (2) high performance to handle both online concurrency and offline throughput; (3) scenario‑independence allowing a single dataset to be reused across use‑cases; (4) semantic support for evolving streaming operators; and (5) tool efficiency for easy iteration and pipeline automation.

Implementation leverages multiple data sources (HDFS, Kafka, SDKs, Nebula) and integrates them through a stream‑batch layer that selects the appropriate OpenMLDB mode: RocksDB for low‑real‑time requirements, Spark‑FE‑based offline mode for batch processing (several times faster than Spark), and OpenMLDB online mode for hard‑real‑time tasks achieving sub‑200 ms latency.

Fusion computing is performed on Ray (previously Flink), enabling seamless MLOps pipelines where the same SQL logic is used for continuous delivery and deployment. OpenMLDB also facilitates AutoML, low‑code analytics, and fine‑grained BI applications.

Practical usage tips include defining indexes with KEY or TS keywords for efficient time‑window queries, using placeholder values to simplify queries, and handling exclusion of the current row in financial time‑window calculations.

Recommendations emphasize using OpenMLDB for complex logic verification, scenarios with clear time or index slicing, and teams that prefer to avoid deep big‑data performance tuning. Future outlook covers heterogeneous resource support, richer I/O connectors, improved Java/Python SDKs, community‑driven documentation, and enhanced SRE features such as asynchronous data expiration and finer‑grained logging.

Overall, Akulaku’s experience demonstrates that OpenMLDB can provide a high‑performance, unified platform for both batch and real‑time analytics, supporting AI‑driven services while simplifying development and operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Batch Processing Streaming OpenMLDB

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.