Big Data 13 min read

Scaling Real‑Time Data Warehousing for Double‑11: Flink + Hologres in Action

During the 2021 Double‑11 shopping festival, logistics provider DiSiFang upgraded its real‑time data warehouse with Flink and Hologres, enabling multi‑billion‑row joins, cutting costs by 50%, and delivering stable, low‑latency analytics that powered high‑frequency dashboards and improved overall delivery speed.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Scaling Real‑Time Data Warehousing for Double‑11: Flink + Hologres in Action

1 Business Introduction

DiSiFang, founded in 2004 in Shenzhen, is one of China’s earliest international logistics and global warehousing service providers, serving cross‑border e‑commerce merchants, platforms, and consumers through its GPN (direct shipping) and GFN (overseas warehousing) networks, with over 100 branches worldwide and more than 2 billion end users.

2 Business Challenges

To handle Double‑11 order peaks reaching tens of millions per day, DiSiFang leveraged big‑data‑driven resource optimization, expanding over 40 warehouses and sorting centers covering 500,000 m². It deployed proprietary sorting systems, barcode recognition, and AI‑enhanced verification to reduce mis‑picks to 0.03% and pursued automation, digitalization, and cloud‑based solutions.

The existing real‑time data warehouse could no longer meet the demand; evaluations of HBase, ClickHouse, and Druid revealed bottlenecks for trillion‑level multi‑table joins.

3 DiSiFang Real‑Time Data Warehouse Journey

Real‑Time Data Warehouse 1.0

Initially built on ADB for its high throughput and easy data sync via DTS and OTTER, the system suffered from limited concurrency and high latency under heavy dashboard queries.

Real‑Time Data Warehouse 2.0

Learning from version 1.0, DiSiFang adopted a Flink + Hologres architecture. Two data paths were created: (1) Binlog → DataHub → Flink → Hologres for high‑frequency, large‑volume metrics; (2) Direct Binlog sync to Hologres with ODS, DWD, and DWS layers for raw, cleaned, and aggregated data. This hybrid batch‑stream model leveraged Flink’s stream processing and Hologres’s powerful join capabilities, outperforming traditional real‑time databases.

4 Hologres in DiSiFang’s Real‑Time Data Warehouse

Why Hologres?

Real‑time capability with sub‑second query response for hundred‑billion‑row tables and massive concurrent writes.

Storage‑compute separation on Alibaba Cloud Pangu, enabling rapid scaling of compute or storage as needed.

Low operational cost—approximately one‑third of ADB—while maintaining high stability.

Hologres Application Scenarios

In OLAP analysis, Hologres supports both real‑time and offline queries, handling high‑concurrency writes and complex multi‑table joins efficiently.

Scenario 1: In‑warehouse operations—Binlog data is parsed to the ODS layer, minute‑level micro‑batches generate DWS wide tables, and data is refreshed every five minutes via DataWorks.

Scenario 2: Inter‑warehouse allocation—small tables are joined in Hologres using views, delivering millisecond‑level query performance and reducing scheduling overhead.

Current Limitations

Hologres lacks indexing on non‑null columns, which can slow joins on massive tables, and its PostgreSQL compatibility offers a limited function set, posing some development challenges.

5 Business Value

During Double‑11, the Flink + Hologres real‑time data warehouse powered high‑frequency dashboards, ensured zero‑failure operation, improved delivery timeliness, and enabled dynamic scaling to handle traffic spikes thousands of times higher than normal, thereby reducing operational costs.

Cassandra Database Introduction and Practice

Apache Cassandra is an open‑source distributed NoSQL database originally developed by Facebook. It offers linear scalability, high fault tolerance, and excels at handling massive data sets, ranking top in the DB‑Engines list for wide‑table databases. Alibaba Cloud partners with DataStax to provide a training course covering Cassandra fundamentals, big‑data analytics, and AI integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datacloud computingFlinkHologresLogisticsreal-time data warehouse
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.