How CDC + Serverless Functions Enable Real‑Time ETL in Cloud Native Architectures
This article explains how Alibaba Cloud's Serverless Function Compute combined with Database Change Data Capture (CDC) creates a complete, real‑time ETL pipeline, detailing the ETL model, DTS integration, architecture components, event‑driven processing, and practical use cases such as OLTP‑to‑OLAP data flow.
ETL Model
The classic ETL pipeline consists of:
Extract : Pull data from sources such as relational databases, file systems, message queues, or APIs.
Transform : Clean, merge, enrich, and reformat data to match the target schema.
Load : Write the transformed data into destinations like data warehouses, data lakes, or BI systems.
CDC as the Extract Layer
Database Change Data Capture (CDC) captures row‑level changes (e.g., MySQL binlog, MongoDB oplog) in near real time. CDC provides the incremental Extract function, enabling continuous data flow without periodic batch pulls.
Alibaba Cloud DTS + Function Compute Architecture
Data Transmission Service (DTS) implements CDC and forwards events to Serverless Function Compute (FC). The architecture has three logical modules:
Poller : Retrieves full‑table snapshots for initial load and streams incremental logs for real‑time changes.
Format Plugin : Normalizes events to Canal‑JSON format, providing a unified schema for downstream processing.
Sinker : Pushes the formatted events to FC.
FC Processing Flow
Request routing : The FC gateway routes each DTS event to the appropriate function instance.
Automatic scheduling : FC scales compute nodes on demand and invokes the user‑defined function for each event.
Code execution : The function processes the event (e.g., enrichment, validation) and forwards results via SDK/API to external services such as AnalyticDB, ClickHouse, or messaging systems.
Key Implementation Details
Poller behavior : For bulk data, DTS issues concurrent full‑table scans and streams results downstream. For incremental data, it reads the source’s binary log (MySQL) or oplog (MongoDB), parses change entries, and emits them as Canal‑JSON.
Format Plugin : Converts raw log entries into a consistent JSON structure, simplifying downstream parsing regardless of source type.
Sinker : Sends the JSON payload to the FC endpoint using internal RPC; no manual network configuration is required.
FC Function Skeleton (Python example)
import json
import aliyun_fc_sdk as fc
def handler(event, context):
# event is a Canal‑JSON string
data = json.loads(event)
# Example transformation: add a processing timestamp
data['processed_at'] = context.timestamp
# Forward to downstream service (e.g., AnalyticDB) via SDK
fc.send_to_analyticdb(data)
return {'status': 'ok'}Typical Use Cases
Synchronizing OLTP transaction tables to OLAP analytical stores (AnalyticDB, ClickHouse) for real‑time reporting.
Driving event‑driven microservices where CDC events trigger business logic in FC.
Custom data validation or enrichment before persisting to target systems.
Auditing and compliance by persisting change events to immutable storage.
Change notifications via email, DingTalk, SMS, etc., when critical rows are modified.
Advantages
Combining CDC with Serverless FC provides:
Real‑time incremental extraction without managing servers.
Automatic scaling and pay‑as‑you‑go billing.
Separation of concerns: developers focus on transformation logic; infrastructure, monitoring, and fault tolerance are handled by FC.
Future Extensions
Support for additional source databases such as Oracle, PolarDB PostgreSQL, and PolarDB MySQL is planned, expanding the range of CDC‑driven ETL scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
