Understanding Data Streams: From Node.js to Java, Kafka, and Kinesis

This article explains what data streams are, how they differ from arrays, the types of streams in Node.js, demonstrates Java Stream operations, and introduces popular streaming platforms like Apache Kafka and Amazon Kinesis, highlighting their core features and real‑time processing capabilities.

21CTO
21CTO
21CTO
Understanding Data Streams: From Node.js to Java, Kafka, and Kinesis

In programming, a stream is a sequence of data elements that are processed on‑demand rather than stored like an array; streams can be infinite and require a source such as a file, list, or I/O resource.

Node.js defines four stream types:

Writable : streams you can write data to, e.g., file writes or HTTP responses.

Readable : streams you can read data from, e.g., file reads or incoming HTTP requests.

Duplex : both readable and writable, such as TCP sockets.

Transform : duplex streams that modify or compress data while passing it through, e.g., zlib compression.

Filters are functions that operate on a stream to produce another stream, as shown in the following Java example:

Arrays.asList(10,3,13,4,1,52)
    .stream()
    .filter(number -> number % 2 == 0) // 10,4,52
    .sorted() // 4,10,52
    .skip(1) // 10,52
    .forEach(System.out::println); // prints 10 and 52

Java Streams separate operations into intermediate (e.g., filter, sorted) and terminal (e.g., forEach, reduce) stages. The pipeline is lazily evaluated and only runs when a terminal operation is invoked.

List<Integer> numbers = Arrays.asList(10,3,13,4,1,52);
Stream<Integer> numberStream = numbers.stream()
    .filter(number -> number % 2 == 0)
    .sorted()
    .skip(1)
    .peek(System.out::println) // executes during processing
    .findFirst().get();

Parallel streams split the data into sub‑streams that are processed concurrently, as illustrated below:

Arrays.asList(10,3,13,4,1,52,2,6,8)
    .parallelStream()
    .filter(number -> number % 2 == 0)
    .forEach(number -> System.out.println(Thread.currentThread())); // shows executing thread

Apache Kafka is a distributed streaming platform offering three main capabilities: publish/subscribe record streams, fault‑tolerant persistent storage of streams, and timestamped processing of records. It supports real‑time use cases such as messaging, activity tracking, log aggregation, and event sourcing.

Kafka architecture diagram
Kafka architecture diagram

Amazon Kinesis is a fully managed AWS service for real‑time video and data stream collection, processing, and analysis. Its four functions are Kinesis Video Streams, Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics, enabling scenarios like video analytics, real‑time applications, and IoT data processing.

Kinesis data flow diagram
Kinesis data flow diagram

In summary, the article covered the fundamentals of data streams, demonstrated Node.js and Java stream APIs, and introduced major streaming tools Apache Kafka and Amazon Kinesis, highlighting their roles in real‑time data processing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend DevelopmentNode.jsApache Kafkadata streamsAmazon KinesisJava Streams
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.