Big Data 13 min read

How Alibaba’s ADC Project Automates Real‑Time SQL Generation with Design Patterns and Priority Queues

This article explains how the Alibaba DChain Data Converger (ADC) automatically creates wide‑table SQL for real‑time cross‑database analytics by using a pipeline architecture, priority‑queue‑driven task scheduling, and specific design patterns to handle metadata, joins, and resource management.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba’s ADC Project Automates Real‑Time SQL Generation with Design Patterns and Priority Queues

Overview

ADC (Alibaba DChain Data Converger) provides a tool that lets users configure metrics on the front end, after which the system automatically generates a wide table for real‑time data queries. The data source can span multiple databases and target media, enabling global real‑time visualization of supply‑chain data.

Architecture

The overall architecture includes metadata ingestion, an adaptation layer, a scheduler, a planning center, a SQL generator, an alarm center, and a reconciliation center. Resources are managed by a resource management center that integrates Alibaba Cloud services such as MaxCompute, Flink, and AnalyticDB.

System Flow

The process is divided into two main phases: a synchronous SQL‑generation phase that validates data and builds SQL, and an asynchronous publishing phase that deploys the generated SQL to Flink.

Requirement Analysis

Support multiple fact tables (streams) and dimension tables, with one fact table as the primary table.

Dimension‑table changes must trigger database updates.

Primary‑to‑auxiliary table cardinality can be 1:1 or N:1.

Full joins for fact tables with identical keys; left joins otherwise.

Only joins and UDFs are required; no GROUP BY.

Low synchronization latency with support for various source and target media.

Technical Implementation

Check Stage

Parameter validation (e.g., presence of fact tables).

Table‑type support check.

Partition‑field verification.

Join‑constraint validation.

Primary‑table uniqueness and ETL information checks.

Metadata (e.g., HBase) verification.

Primary‑key correction for dimension tables.

Data Synchronization

Generate a priority queue for task execution order.

Synchronously fill source tables into HBase with type conversion and ETL.

Prune duplicate columns.

Mark blank columns that need not appear in the wide table.

Populate ordering fields based on source timestamps or system receipt time.

Compute Stage

Populate intermediate tables from full joins.

Upgrade join relationships.

Fill reverse‑index information.

Attach message queues for downstream triggers.

Fill the wide table with data, join keys, ETL metadata, and partition fields.

Store generated SQL and table definitions for reuse.

Design Patterns

A pipeline (PipeLine) pattern is used, where a global PipeLineContainer manages multiple pipelines and contexts. Each pipeline consists of reusable valve components that can be combined to implement tasks such as synchronous SQL generation and asynchronous publishing.

Data Structures & Algorithms

The core problem is to represent relationships among meta‑nodes (tables) as either full joins or left joins, then construct a tree where leaf nodes correspond to synchronized data sources and the root node is the final wide table.

Algorithm steps:

Create priority queues for four task levels (sync tasks, full‑join tasks, left‑join tasks, publishing tasks).

Execute level‑1 tasks to sync six data sources into leaf nodes.

Execute level‑2 tasks to produce intermediate tables via full joins.

Execute level‑3 tasks to perform left joins and obtain the root node.

Execute level‑4 tasks to publish the root node.

This priority‑driven, tree‑building approach enables automatic construction of complex SQL statements.

Summary

The article focuses on the key data structures, design patterns, and algorithms used in ADC’s automatic SQL generation module, demonstrating how a pipeline architecture with priority queues and a tree‑based task model can achieve sub‑2‑second end‑to‑end latency for real‑time wide‑table creation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datareal-time datapriority-queueSQL generation
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.