Big Data 12 min read

Building a Scalable Big Data Service Platform: Architecture & Low‑Code Orchestration

This article explains the end‑to‑end big data processing pipeline, outlines the diverse challenges of data interfaces, storage and performance, introduces the unified "Three Ones" approach, details a three‑layer service architecture, and shows how low‑code orchestration can streamline API creation and composition.

dbaplus Community

Oct 30, 2021

Building a Scalable Big Data Service Platform: Architecture & Low‑Code Orchestration

Introduction

The speaker first describes the typical big‑data processing flow: data collection → ETL (data cleaning) → data service → data visualization. OLAP‑oriented analytics and OLTP‑oriented online business data (e.g., e‑commerce orders, video‑platform logs) are both covered.

Background

Key problems identified include:

Diverse scenarios such as recommendation, marketing, reporting, dashboards, and data products.

Various interface types (API, RPC, real‑time streams, files) with different QPS and latency requirements.

Performance demands ranging from billions of QPS with millisecond latency to low‑throughput reporting.

Multiple storage options (HBase, Redis, MySQL, Doris, Hive, etc.).

Different execution engines (Java, C++, Go, client libraries, SQL).

Metric definitions that vary across business contexts.

Solution – The “Three Ones”

To address the above, the platform adopts three unification principles:

OneAPI : a unified data‑service interface that abstracts away differing QPS, latency, and transport protocols (HTTP, RPC, file transfer).

OneSQL : a single language layer that can parse and access multiple storage back‑ends.

OneModel : a unified data model that supports heterogeneous data sources.

Evolution Roadmap

The platform evolves through three stages (illustrated in the accompanying diagram), moving from isolated data pipelines to a fully service‑oriented architecture.

Core Architecture Design

The architecture is divided into three layers:

Data Application Access Layer : external applications connect via HTTP, RPC, client, stream, or file services.

Data Service Parsing Layer : built on Apache Calcite, it provides SQL parsing, validation, routing, optimization, execution, and diagnostic rate‑limiting.

Data Storage Layer : abstracts storage engines such as MySQL, Redis, Hive, HBase, etc., exposing them through unified APIs.

Platform governance features (permission management, monitoring, rate limiting, metadata management, service orchestration) are handled alongside these layers.

Data Service Production Workflow

Select a storage engine (e.g., HBase, MySQL).

Configure the query SQL and parameters.

Convert the SQL into an executable form for the chosen engine.

Generate an atomic API service.

//统计某一天的每个店铺的销售额
select shop_id, sum(gmv) as total_gmv
from (
  select * from table where dt=#{dt}
) t
group by shop_id;

Once generated, the API can be invoked directly or composed with other atomic services, but manual composition quickly becomes costly, motivating a low‑code approach.

Low‑Code Service Orchestration

Three orchestration patterns are demonstrated:

Serial composition : the output of an order service is transformed and fed into a product service (illustrated in the diagram).

Parallel composition : multiple independent services are called concurrently before a downstream service proceeds.

Conditional logic : runtime decisions determine which service branch to execute.

Outlook and Summary

Future directions include integrating emerging storage solutions (e.g., IoT‑oriented databases like Tdengine), expanding the platform’s audience to analysts and algorithm engineers, supporting algorithm service orchestration, and enhancing data quality and security mechanisms.

Q&A Highlights

Low‑code platforms can satisfy most data‑service requirements, though highly complex business logic may still need custom code.

Reporting services depend on the chosen transport protocol (HTTP, RPC, etc.).

The technical stack consists of three layers: storage, Apache Calcite‑based parsing, and Spring‑Boot‑based API services.

Stability is ensured through pre‑release testing, comprehensive monitoring, and gradual (gray) rollouts.

SQL diagnostics currently rely on execution time and QPS thresholds.

Supported storages include HBase, MySQL, Redis, and ClickHouse.

API lifecycle management includes permission control and periodic cleanup of unused APIs.

References

Internet‑scale big‑data service platform construction and practice (CSDN article).

Vipshop’s billion‑scale data service platform case study (InfoQ article).

"Alibaba’s Big Data Journey" book.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL low-code ETL data services

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.