Big Data 9 min read

Introduction to Confluent KSQL for Real-Time Stream Processing

This article introduces Confluent KSQL, a SQL‑based real‑time stream processing engine for Kafka, covering its architecture, stream vs table concepts, query lifecycle, Docker‑based setup, DDL commands, example joins, windowed aggregations, connectors, and its advantages and limitations.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Introduction to Confluent KSQL for Real-Time Stream Processing

Before reading this article, it is assumed that you are already familiar with Kafka concepts such as brokers, topics, partitions, and consumers.

Stream processing deals with continuous, high‑volume, ordered data that grows indefinitely over time, treating the data as a dynamic collection.

Confluent KSQL is a real‑time data‑stream processing engine built on Kafka that provides a powerful yet easy‑to‑use SQL interface, allowing you to manipulate Kafka streams without writing code. It offers high scalability, elasticity, fault tolerance, and supports operations like filtering, transformation, aggregation, joins, windowing, and sessionization.

The KSQL architecture consists of four main components: the KSQL engine (processes statements and queries), a REST interface (connector between clients and the engine), the KSQL CLI (command‑line client that communicates via the REST API), and the KSQL UI (graphical control center).

In KSQL, a stream represents an immutable, append‑only sequence of events (the full history), while a table represents a snapshot of the current state derived from a stream, similar to a relational database table.

The query lifecycle includes: (1) registering a stream or table with DDL, (2) defining an application with a KSQL statement, (3) parsing the DDL/DML into an AST, (4) generating a logical plan, (5) generating a physical execution plan, (6) executing the Kafka stream application, and (7) managing the application via stream/table operations.

The simplest way to try KSQL is to use Docker. The following commands clone the Confluent Docker images, check out version 5.2.1‑post, and start the compose stack containing Zookeeper, Kafka, and KSQL (minimum 8 GB RAM recommended):

git clone https://github.com/confluentinc/cp-docker-images
cd cp-docker-images
git checkout 5.2.1-post
cd examples/cp-all-in-one/
docker-compose up -d --build
# create topic:
docker-compose exec broker kafka-topics --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic users
# create topic:
docker-compose exec broker kafka-topics --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic pageviews

Sample data can be generated using Kafka Connect datagen connectors:

wget https://github.com/confluentinc/kafka-connect-datagen/raw/master/config/connector_pageviews_cos.config
curl -X POST -H "Content-Type: application/json" --data @connector_pageviews_cos.config http://localhost:8083/connectors
wget https://github.com/confluentinc/kafka-connect-datagen/raw/master/config/connector_users_cos.config
curl -X POST -H "Content-Type: application/json" --data @connector_users_cos.config http://localhost:8083/connectors

Start the KSQL CLI: docker-compose exec ksql-cli ksql http://ksql-server:8088 Typical DDL statements include creating streams and tables, dropping them, and creating streams/tables as SELECT queries:

CREATE STREAM pageviews (viewtime BIGINT, userid VARCHAR, pageid VARCHAR) WITH (KAFKA_TOPIC='pageviews', VALUE_FORMAT='AVRO');

CREATE TABLE users (registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid VARCHAR) WITH (KAFKA_TOPIC='users', VALUE_FORMAT='AVRO', KEY='userid');

SHOW STREAMS;

Example query to read the first three records:

SET 'auto.offset.reset' = 'earliest';
SELECT pageid FROM pageviews LIMIT 3;

Creating a new stream by joining two existing streams (left join) and filtering by gender:

CREATE STREAM pageviews_female AS SELECT users.userid AS userid, pageid, regionid, gender FROM pageviews LEFT JOIN users ON pageviews.userid = users.userid WHERE gender = 'FEMALE';

Creating a filtered stream that writes to a custom Kafka topic:

CREATE STREAM pageviews_female_like_89 WITH (kafka_topic='pageviews_enriched_r8_r9', value_format='AVRO') AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9';

Aggregating data with a tumbling window and a HAVING clause:

CREATE TABLE pageviews_regions AS SELECT gender, regionid, COUNT(*) AS numusers FROM pageviews_female WINDOW TUMBLING (size 30 second) GROUP BY gender, regionid HAVING COUNT(*) > 1;

To view the definition of a stream or table, use a MySQL‑like DESCRIBE command: DESCRIBE EXTENDED pageviews_female_like_89; KSQL can connect to external systems (e.g., MySQL, S3, HDFS) via KSQL connectors.

Advantages: distributed, fault‑tolerant, elastic, scalable, real‑time processing; supports filtering, transformation, aggregation, joins, windowing, sessionization; SQL‑like syntax reduces learning curve; enables real‑time reporting, monitoring, alerts, session analytics, and ETL. Disadvantages: relatively heavyweight and resource‑intensive.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerBig Datastream processingSQLReal-time analyticsKafkaKSQL
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.