KSQL Quick Start: Deploying and Querying Kafka Data with Streaming SQL
This article introduces KSQL as a lightweight streaming SQL engine for Apache Kafka, explains its architecture and core concepts of streams and tables, and provides step‑by‑step deployment instructions, command‑line examples for creating streams/tables, querying data, and managing persistent queries.
Background : Kafka began as a log messaging system popular with operations teams, later evolving into a streaming platform where developers and business systems increasingly needed to query specific records directly.
Requirement : Users sought a lightweight tool to run SQL queries against Kafka data without the overhead of full‑stack solutions like Presto, which impose format constraints.
Introduction to KSQL : KSQL is an interactive streaming SQL engine for Apache Kafka that lowers the barrier to stream processing. It offers a simple SQL interface to query, transform, and aggregate Kafka topics, supporting both STREAM and TABLE abstractions and built on the Kafka Streams API.
Architecture :
Deployment architecture: a KSQL server process (or a cluster of servers) handles queries, exposing a REST API and a CLI client.
Processing architecture: KSQL leverages Kafka Streams for fault‑tolerant, scalable state management and supports exactly‑once semantics.
Core Concepts :
Stream : an unbounded, immutable sequence of records; new facts can be appended but existing ones are never updated.
Table : a mutable view derived from a stream or another table, analogous to a traditional database table but with streaming semantics.
Deployment Steps (tested with Confluent Platform 5.0 and Kafka 0.11+):
wget https://packages.confluent.io/archive/5.0/confluent-oss-5.0.0-2.11.tar.gz
tar zxvf confluent-oss-5.0.0-2.11.tar.gz -C /opt/programs/confluent_5.0.0 cd /opt/programs/confluent_5.0.0
bin/zookeeper-server-start -daemon etc/kafka/zookeeper.properties cd /opt/programs/confluent_5.0.0
bin/kafka-server-start -daemon etc/kafka/server.propertiesGenerate sample data with ksql-datagen:
cd /opt/programs/confluent_5.0.0/bin
./ksql-datagen quickstart=pageviews format=delimited topic=pageviews maxInterval=500 cd /opt/programs/confluent_5.0.0/bin
./ksql-datagen quickstart=users format=json topic=users maxInterval=100Start the KSQL server and connect via the CLI:
cd /opt/programs/confluent_5.0.0
bin/ksql-server-start -daemon etc/ksql/ksql-server.properties cd /opt/programs/confluent_5.0.0
bin/ksql http://<em>your‑host</em>:8088Creating Streams and Tables :
CREATE STREAM pageviews_original (viewtime BIGINT, userid VARCHAR, pageid VARCHAR)
WITH (kafka_topic='pageviews', value_format='DELIMITED'); CREATE TABLE users_original (registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid VARCHAR)
WITH (kafka_topic='users', value_format='JSON', key='userid');Querying Data (default reads from the latest offset; set SET 'auto.offset.reset' = 'earliest'; to read from the beginning):
SELECT * FROM USERS_ORIGINAL LIMIT 3;
SELECT * FROM pageviews_original LIMIT 3;Persistent Queries : Create a continuous query that writes results to a new topic.
CREATE STREAM pageviews2 AS SELECT userid FROM pageviews_original;Verify the new stream and its backing topic:
SHOW STREAMS; SHOW QUERIES;Consume the output topic to see the filtered data:
cd /opt/programs/confluent_5.0.0/bin
./kafka-console-consumer --bootstrap-server <em>host</em>:9092 --from-beginning --topic PAGEVIEWS2Terminate the persistent query when it is no longer needed: TERMINATE CSAS_PAGEVIEWS2_0; This guide demonstrates how KSQL provides a lightweight, SQL‑based approach to explore and process Kafka data without the complexity of larger query engines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
