Big Data 13 min read

OpenMLDB Pulsar Connector: A Real‑time Data Integration Guide

This article presents a step‑by‑step tutorial on using the OpenMLDB Pulsar Connector to stream real‑time data from Apache Pulsar into OpenMLDB, covering connector architecture, key features, Docker‑based installation, sink configuration, schema registration, message production, verification queries, and future roadmap details.

DataFunTalk
DataFunTalk
DataFunTalk
OpenMLDB Pulsar Connector: A Real‑time Data Integration Guide

The OpenMLDB Pulsar Connector enables stable, real‑time streaming integration between Apache Pulsar and the OpenMLDB online machine‑learning database, turning live data streams into AI‑ready features for faster model updates and batch predictions.

Pulsar Connector Overview – Pulsar, a cloud‑native distributed messaging platform, provides a connector framework. By implementing a JDBC sink connector, Pulsar messages can be written directly into OpenMLDB’s online storage, creating a seamless data pipeline.

Key Features – The connector is easy to use (no code required), highly extensible (single‑node or cluster deployment), and sustainable with simple installation and configuration, dramatically improving data‑usage efficiency for developers.

Installation (Docker) – Run OpenMLDB and Pulsar containers using host networking and bind a shared files directory: docker run -dit --network host -v `pwd`/files:/work/taxi-trip/files --name openmldb 4pdosc/openmldb:0.4.4 bash docker exec -it openmldb bash docker run -dit --network host -v `pwd`/files:/pulsar/files --name pulsar apachepulsar/pulsar:2.9.1 bash docker exec -it pulsar bash bin/pulsar-daemon start standalone --zookeeper-port 5181

Create OpenMLDB Database and Table – Inside the OpenMLDB container, execute a SQL script (e.g., files/create.sql ) that creates the pulsar_test database and the connector_test table with appropriate column types (including long for timestamps).

Configure Pulsar Sink – Prepare a sink configuration YAML ( files/pulsar-openmldb-jdbc-sink.yaml ) specifying tenant, namespace, sink name, archive path, input topic ( test_openmldb ), and JDBC connection details. Create the sink with: ./bin/pulsar-admin sinks create --sink-config-file files/pulsar-openmldb-jdbc-sink.yaml ./bin/pulsar-admin sinks status --name openmldb-test-sink

Upload Schema – Define a JSON schema for the topic and upload it: ./bin/pulsar-admin schemas upload test_openmldb -f ./files/openmldb-table-schema ./bin/pulsar-admin schemas get test_openmldb

Test Message Production – Use the provided Java client ( files/pulsar-client-java-1.0-SNAPSHOT-jar-with-dependencies.jar ) to send two sample JSON messages to test_openmldb . The sink writes these messages into OpenMLDB, which can be verified by running: ../openmldb/bin/openmldb --zk_cluster=127.0.0.1:2181 --zk_root_path=/openmldb --role=sql_client < files/select.sql

Verification – Pulsar sink status shows numReadFromPulsar=2 and numWrittenToSink=2 . Query results from OpenMLDB confirm that the two records are stored correctly.

Roadmap & Ecosystem – OpenMLDB plans to release v0.5.0 with windowed aggregation, enhanced monitoring, pluggable storage engines, UDF support, and broader connector ecosystem (Kafka, Pulsar, HDFS, etc.). The community encourages contributions and provides links to documentation and GitHub issues.

ConnectorReal-time Streamingdata integrationApache PulsarOpenMLDB
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.