Cloud Native 10 min read

How CDC + Serverless Functions Enable Real‑Time ETL in Cloud Native Architectures

This article explains how Alibaba Cloud's Serverless Function Compute combined with Database Change Data Capture (CDC) creates a complete, real‑time ETL pipeline, detailing the ETL model, DTS integration, architecture components, event‑driven processing, and practical use cases such as OLTP‑to‑OLAP data flow.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How CDC + Serverless Functions Enable Real‑Time ETL in Cloud Native Architectures

ETL Model

The classic ETL pipeline consists of:

Extract : Pull data from sources such as relational databases, file systems, message queues, or APIs.

Transform : Clean, merge, enrich, and reformat data to match the target schema.

Load : Write the transformed data into destinations like data warehouses, data lakes, or BI systems.

CDC as the Extract Layer

Database Change Data Capture (CDC) captures row‑level changes (e.g., MySQL binlog, MongoDB oplog) in near real time. CDC provides the incremental Extract function, enabling continuous data flow without periodic batch pulls.

Alibaba Cloud DTS + Function Compute Architecture

Data Transmission Service (DTS) implements CDC and forwards events to Serverless Function Compute (FC). The architecture has three logical modules:

Poller : Retrieves full‑table snapshots for initial load and streams incremental logs for real‑time changes.

Format Plugin : Normalizes events to Canal‑JSON format, providing a unified schema for downstream processing.

Sinker : Pushes the formatted events to FC.

FC Processing Flow

Request routing : The FC gateway routes each DTS event to the appropriate function instance.

Automatic scheduling : FC scales compute nodes on demand and invokes the user‑defined function for each event.

Code execution : The function processes the event (e.g., enrichment, validation) and forwards results via SDK/API to external services such as AnalyticDB, ClickHouse, or messaging systems.

Key Implementation Details

Poller behavior : For bulk data, DTS issues concurrent full‑table scans and streams results downstream. For incremental data, it reads the source’s binary log (MySQL) or oplog (MongoDB), parses change entries, and emits them as Canal‑JSON.

Format Plugin : Converts raw log entries into a consistent JSON structure, simplifying downstream parsing regardless of source type.

Sinker : Sends the JSON payload to the FC endpoint using internal RPC; no manual network configuration is required.

FC Function Skeleton (Python example)

import json
import aliyun_fc_sdk as fc

def handler(event, context):
    # event is a Canal‑JSON string
    data = json.loads(event)
    # Example transformation: add a processing timestamp
    data['processed_at'] = context.timestamp
    # Forward to downstream service (e.g., AnalyticDB) via SDK
    fc.send_to_analyticdb(data)
    return {'status': 'ok'}

Typical Use Cases

Synchronizing OLTP transaction tables to OLAP analytical stores (AnalyticDB, ClickHouse) for real‑time reporting.

Driving event‑driven microservices where CDC events trigger business logic in FC.

Custom data validation or enrichment before persisting to target systems.

Auditing and compliance by persisting change events to immutable storage.

Change notifications via email, DingTalk, SMS, etc., when critical rows are modified.

Advantages

Combining CDC with Serverless FC provides:

Real‑time incremental extraction without managing servers.

Automatic scaling and pay‑as‑you‑go billing.

Separation of concerns: developers focus on transformation logic; infrastructure, monitoring, and fault tolerance are handled by FC.

Future Extensions

Support for additional source databases such as Oracle, PolarDB PostgreSQL, and PolarDB MySQL is planned, expanding the range of CDC‑driven ETL scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ServerlessETLData IntegrationAlibaba CloudFunction ComputeCDC
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.