Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan
This article presents Jushuitan's cloud‑native OLAP architecture, detailing its evolution, current big‑data stack—including DataWorks, MaxCompute, Flink, Hologres, and Aerospike—along with logistics warning workflows, rule‑matching mechanisms, real‑time processing challenges, and future scalability plans.
Jushuitan, a SaaS ERP provider for e‑commerce, introduced its data‑driven products that embed analytics into business processes to reduce loss, improve compliance, and enable multi‑role collaboration across the entire order‑to‑delivery lifecycle.
The data‑warehouse architecture has evolved through five stages: early online databases, migration to Greenplum, large‑scale cluster management, integration with Alibaba Cloud services (ADB for Postgres/MySQL), and finally a cloud‑native stack based on DataWorks + MaxCompute for offline processing and Flink + Hologres for real‑time analytics.
The current technical stack includes Kafka for data ingestion, a self‑developed synchronization middleware, Flink for rule matching and stateful stream processing, Aerospike for external state storage, and Hologres for both high‑QPS point queries and OLAP workloads. This enables a logistics warning system that monitors order timeliness, triggers alerts, and stores results in Hologres tables for downstream analysis.
Key components of the logistics warning pipeline are: rule tables in Hologres, real‑time rule evaluation in Flink, timer registration in Aerospike, and result persistence via Binlog. The system processes roughly 100 billion events daily, with timers and external state reaching tens of billions.
Future directions focus on elastic resource scaling, stronger multi‑tenant isolation, intelligent operations, longer‑cycle replay capabilities, and tighter integration with cloud services such as Lindorm, aiming to achieve a seamless stream‑batch unified computation model.
The Q&A section clarifies that rule matching is implemented with custom Flink functions rather than CEP, discusses handling of late data via external state, and explains challenges of long‑cycle replay under massive data volumes.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.