Why Use RocketMQ Connect for Scalable Data Pipelines?
This article explains the challenges of point‑to‑point data sync, introduces RocketMQ Connect as a cloud‑native solution that decouples upstream and downstream, details its architecture, connectors, REST API, metrics, deployment modes, and provides a step‑by‑step guide to building custom connectors for use cases such as CDC, data lakes, and system migration.
Overview
RocketMQ Connect is a distributed, fault‑tolerant component of the RocketMQ ecosystem that provides streaming data integration between RocketMQ and external systems. It decouples upstream and downstream services, offers low latency (seconds to milliseconds), high reliability (cluster deployment, failover, replay), and low‑code configuration.
Core Concepts
Connector : defines the data direction. SourceConnector reads from an external system into RocketMQ; SinkConnector writes from RocketMQ to a target system.
Task : the stateless worker that actually copies data. Multiple Tasks can run in parallel to increase throughput.
Worker : hosts Connectors and Tasks, provides a RESTful management API, stores configuration and offset information, and handles load balancing, high availability, and elastic scaling across a cluster.
Data Flow
For a SourceConnector the pipeline is:
External System → SourceConnector → (optional Transform) → Converter (optional Schema Registry) → RocketMQ TopicFor a SinkConnector the reverse flow applies: RocketMQ → Converter → (optional Transform) → SinkConnector → Target System.
Plugins
Converter : serializes/deserializes records and optionally registers schemas.
Transform : per‑message operations such as field filtering, key replacement, or case conversion.
RESTful Management API
POST /connectors/{connector_name}– create a connector GET /connectors/{connector_name}/config – retrieve configuration GET /connectors/{connector_name}/status – query status POST /connectors/{connector_name}/stop – stop the connector
Metrics and Observability
Workers expose metrics such as overall TPS, per‑Task TPS, success/failure counts, and offset lag. These metrics can be scraped by Prometheus via a custom exporter.
Deployment Modes
Cluster mode : multiple Workers form a high‑availability cluster. Configuration, offset, and status are synchronized through dedicated RocketMQ topics using broadcast consumption.
Standalone mode : a single Worker runs without HA; offsets are persisted locally. Suitable for Kubernetes where the platform provides pod‑level HA.
Plugin Loading
Connectors are packaged as JAR files and loaded by Workers with isolated class loaders, preventing dependency conflicts similar to Tomcat’s WAR loading mechanism.
Building a Custom Connector (Example: MySQL → RocketMQ → Hudi)
Implement a SourceConnector class (e.g., MySqlSourceConnector) that defines configuration properties and returns the associated Task class.
Implement the Task (e.g., MySqlTask) to connect to MySQL, read binlog events, convert each event to a ConnectRecord, and place records into a blocking queue.
Package the connector JAR and copy it to the Worker’s plugin directory.
Use the REST API to create the connector instance; the Worker will launch the configured Tasks, which poll the queue and produce records to RocketMQ.
Implement a corresponding SinkConnector (e.g., HudiSinkConnector) that consumes from RocketMQ, deserializes records, and writes them to a Hudi table.
Ecosystem and Future Directions
RocketMQ Connect already includes:
Debezium integration for CDC.
JDBC source/sink support.
OpenMLDB connector for machine‑learning feature stores.
Planned connectors target Doris, ClickHouse, Elasticsearch, and other popular stores via the OpenMessaging Connect API.
References
GitHub repository: https://github.com/apache/rocketmq-connect OpenMessaging Connect specification:
https://github.com/openmessaging/openconnectSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
