Big Data 15 min read

Spring Cloud Data Flow: Building and Deploying Event Stream Pipelines with Apache Kafka (Part 3)

This article explains how Spring Cloud Data Flow, together with Spring Cloud Skipper, enables developers to design, deploy, and manage event‑stream pipelines on Apache Kafka, covering ecosystem overview, pipeline components, Docker‑based local setup, stream creation, debugging, monitoring, and integration of Kafka Streams applications.

Architects Research Society
Architects Research Society
Architects Research Society
Spring Cloud Data Flow: Building and Deploying Event Stream Pipelines with Apache Kafka (Part 3)

Spring Cloud Data Flow Ecosystem

Spring Cloud Data Flow is a toolkit for designing, developing, and continuously delivering data pipelines. It centralises management of event‑stream (long‑running) and task/batch (short‑lived) applications, offering interaction via a dashboard GUI, command‑line shell, Java DSL, and RESTful API.

To deploy pipelines on platforms such as Cloud Foundry or Kubernetes, Data Flow delegates lifecycle operations to Spring Cloud Skipper, while short‑lived batch pipelines are handled directly by Data Flow.

Both runtimes use OAuth 2.0/OpenID for authentication and provide Grafana dashboards for monitoring.

Developing Event‑Stream Applications

Event‑stream pipelines are typically composed of Spring Cloud Stream applications, though any custom app can be used. Out‑of‑the‑box stream apps are available as Maven artifacts or Docker images, built with RabbitMQ or Apache Kafka binders and include built‑in Prometheus and InfluxDB metrics.

Pipeline components include:

Source : the first step that ingests data from external systems (databases, files, IoT devices, etc.).

Processor : consumes data from upstream sources or processors, applies business logic, and emits results downstream.

Sink : the final step that writes data to external stores such as Cassandra, PostgreSQL, or Amazon S3.

By default pipelines are linear, with each app communicating via a single destination (e.g., a Kafka topic). Non‑linear topologies with multiple inputs/outputs are also supported, especially when using Kafka Streams.

Spring Cloud Data Flow Environment Setup

The official Data Flow site provides guides for local, Kubernetes, and Cloud Foundry deployments. This tutorial uses Docker for a local setup. First, clone the Docker compose files from the Spring Cloud Data Flow GitHub repository.

The Docker composition includes:

Apache Kafka

Spring Cloud Data Flow server

Spring Cloud Skipper server

Prometheus (metrics)

Grafana (visualisation)

Automatic registration of out‑of‑the‑box streaming apps

Allocate at least 6 GB of memory for Docker.

Install docker‑compose and run:

export DATAFLOW_VERSION=2.1.0.RELEASE
export SKIPPER_VERSION=2.0.2.RELEASE
docker-compose up

After the containers start, access the dashboard at http://localhost:9393/dashboard and register the provided streaming apps.

Creating an Event‑Stream Pipeline

Using the out‑of‑the‑box http source, transform processor, and log sink, define a simple stream called http‑events‑transformer :

http | transform | log

In the dashboard, enter the following DSL:

http‑events‑transformer=http --server.port=9000 | transform --expression=payload.toUpperCase() | log

Deploy the stream on the local platform, ensuring the log sink inherits logging so its output appears in the Skipper server logs.

During deployment, two Kafka topics are automatically created:

http‑events‑transformer.http (source → processor)

http‑events‑transformer.transform (processor → sink)

These topic names follow Data Flow’s naming convention but can be overridden via binding properties.

Testing the Pipeline

Send test data to the HTTP source:

curl -X POST http://localhost:9000 -d "spring" -H "Content-Type: text/plain"

The log sink records the transformed payload, e.g.:

log‑sink : SPRING

Debugging and Monitoring

Debugging configurations vary by target platform; for local development, set the debugPort deployment property.

Monitoring uses Prometheus metrics and a Grafana dashboard (default credentials admin/admin). Access the Grafana view from the Streams page.

Using Kafka Streams Applications

Kafka Streams apps can be registered as processors. The example kstreams‑word‑count counts words in time windows and is published to the Spring Maven repository.

Register it via the Apps → Add Application(s) UI, then create a stream that connects the http source to the word‑count processor and finally to the log sink.

After deployment, posting text to http://localhost:9001 yields log entries such as:

{"word":"baby","count":1,"start":"2019-03-25T09:53:30.000+0000","end":"2019-03-25T09:54:00.000+0000"}
{"word":"shark","count":1,"start":"2019-03-25T09:53:30.000+0000","end":"2019-03-25T09:54:00.000+0000"}
{"word":"doo","count":6,"start":"2019-03-25T09:53:30.000+0000","end":"2019-03-25T09:54:00.000+0000"}

Conclusion

This tutorial demonstrates how Spring Cloud Data Flow simplifies the development, deployment, monitoring, and security of Apache Kafka‑based event‑stream applications, offering a cloud‑native, automated solution for building robust data pipelines.

Future parts will cover generic topologies and continuous‑deployment patterns for native event‑stream applications.

DockerKubernetesEvent StreamingApache KafkaSpring Cloud Data FlowSpring Cloud Stream
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.