How a FinTech Firm Boosted Real‑Time Decision Making with StarRocks Data Warehouse
This case study details how Shuhe Technology, a leading fintech company, overcame data redundancy, low resource utilization, and slow reporting by adopting Alibaba Cloud EMR Serverless StarRocks for a unified, real‑time data warehouse, achieving standardized data pipelines, cost savings, and minute‑level decision latency.
Client Background and Business Challenges
Shuhe Technology is a well‑known fintech enterprise that provides intelligent retail financial solutions—such as smart marketing, customer service, and operations—to banks, trusts, consumer finance firms, insurers, and micro‑loan companies across consumer credit, SME credit, and scenario‑based installment services.
Rapid business growth rendered its offline data processing architecture unable to meet real‑time decision needs, leading to data redundancy and low resource utilization. Core pain points include complex architecture with multiple OLAP engines (ClickHouse, Doris) causing high maintenance costs, delayed response due to T+1 offline reporting, and resource waste with cluster utilization below 40%.
Technology Selection Decision
Balancing compatibility and performance, Shuhe chose Alibaba Cloud EMR Serverless StarRocks as the real‑time data warehouse solution.
Superior real‑time write and query : Primary‑key model uses Delete+Insert strategy, avoiding Merge‑on‑Read latency and Merge‑on‑Write bottlenecks; single‑table query performance surpasses Doris.
Compute‑storage separation innovation : Hybrid architecture of object storage plus cache disk reduces storage cost by 50% and supports elastic scaling for massive daily data growth.
Strong ecosystem compatibility : Seamless integration with Hive, Kafka, MySQL and other mainstream sources enables a “no‑data‑migration” lake‑warehouse unified architecture.
Technical Solution Implementation
Architecture Design
StarRocks is used to build a three‑layer data architecture:
ODS layer : Real‑time sync of RDS binlog via Flink CDC, direct Kafka streams, and Hive catalog external tables to ensure data freshness.
CDM layer : Standardized processing raises data reuse by 60% and supports over 80% of business analysis needs through a common metric library.
ADS layer : Business‑department data marts with materialized views accelerate key metric queries, reducing average response time to seconds.
Full‑Lifecycle Management
Intelligent scheduling system : Uses StarRocks materialized view for ETL, with periodic or manual triggers, micro‑batch intervals of 5–60 minutes, and custom scheduling to raise resource utilization to 75%.
High‑availability assurance : FE/CN node self‑healing, monitoring of 20+ core metrics (CPU, memory, I/O, compression score) with alarm response under 5 minutes.
Cost‑optimization practice : Cache‑disk strategy stores 80% of data in low‑cost object storage, saving over one million CNY per cluster annually.
Business Scenario Deployment
Real‑time monitoring consumes Flink‑Kafka or Flink CDC binlog streams to guarantee data freshness, delivering near‑real‑time reports. Massive real‑time event data is handled via lake‑warehouse integration to alleviate pressure on StarRocks.
StarRocks serves as the compute engine, pulling real‑time data from the lake via External Catalog, processing it, and storing results in internal tables for fast BI/AI/Ad‑hoc queries.
Offline reporting (T+1) leverages the same lake‑warehouse pipeline, reducing data movement and accelerating queries.
Practical Results
Real‑time decision : Critical business reports improved from hour‑level to minute‑level latency, dramatically speeding risk‑warning response.
Robust architecture : Distributed design with automatic fault tolerance ensures high availability of FE/CN nodes.
Cost reduction and efficiency : Consolidated three legacy systems, lowered resource costs, and used EMR Serverless monitoring to cut operational expenses.
Future Outlook
Shuhe plans to integrate Paimon for a more complete real‑time lake‑warehouse architecture, leverage StarRocks’ multi‑warehouse and dynamic scaling features for resource isolation and read‑write separation, and further enhance cluster stability.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
