Mastering Reactive Streaming with Spring WebFlux: Build Real‑Time APIs Like DeepSeek

This article explains how traditional synchronous JSON responses differ from modern streaming responses, introduces reactive programming concepts, and shows step‑by‑step how to implement non‑blocking, high‑throughput APIs using Spring WebFlux, Mono/Flux, SSE, WebSocket, and related protocols.

Lin is Dream
Lin is Dream
Lin is Dream
Mastering Reactive Streaming with Spring WebFlux: Build Real‑Time APIs Like DeepSeek

In recent years, most web interfaces return a complete JSON payload after the server finishes processing, following the classic B/S model where the controller synchronously builds the response and returns it once all data is ready.

However, large‑model chat services such as DeepSeek deliver responses token by token, a behavior enabled by reactive programming where the server pushes data as soon as it becomes available.

Unlike the traditional MVC approach, reactive streams do not wait for full processing; they start sending data once it is prepared, establishing a connection that allows the server to push incremental updates.

Having worked with Spring MVC and Tomcat, I later encountered Netty for real‑time messaging and discovered Spring WebFlux, a framework designed for reactive, non‑blocking streams.

1. What Is a Streaming Response?

Streaming responses use HTTP chunked transfer encoding, SSE, or WebSocket so the server can send data progressively without waiting for the entire result.

This approach, favored by AI models, simulates human typing, reduces latency, saves compute resources, and improves concurrency; the content type is typically text/event-stream rather than application/json.

2. How to Write a Streaming API in a Spring Project?

Traditional Spring MVC can use SseEmitter with asynchronous threads, but the code is cumbersome and blocks threads.

Spring WebFlux with Flux natively supports non‑blocking streaming, allowing the server to push data as it is processed.

Spring WebFlux, introduced in Spring Framework 5.0, is built on Project Reactor, implements the Reactive Streams specification, and supports asynchronous, non‑blocking I/O on servers like Netty. It natively handles Server‑Sent Events (SSE) and WebSocket, making it ideal for high‑concurrency, I/O‑intensive scenarios.

Example controller method:

@GetMapping("/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<Item> streamItems() {
    return service.streamItemsFromDb();
}

Front‑end can consume the stream using the native EventSource API.

3. Where Can This Pattern Be Applied?

Scenario

Old Approach

New Approach

AI text generation

Wait for full result (spinning)

Show partial output as it arrives

Data dashboard

Aggregate multiple calls then render

Load data incrementally, page by page

Background tasks

Polling

Push execution results in real time

Paginated product data

Calculate total count first

Load continuously using a cursor

4. How to Call Third‑Party Streaming APIs from Java?

Traditional RestTemplate is synchronous and blocks the calling thread. WebClient (org.springframework.web.reactive.function.client.WebClient) is the reactive, non‑blocking HTTP client in Spring WebFlux, supporting asynchronous streaming and replacing RestTemplate.

Typical flow

Call a third‑party streaming endpoint (e.g., a direct‑charge API).

Receive the response (JSON), check success.

Transform the response into database update statements.

Invoke MyBatis updateDetail(...) to persist changes.

Return a Mono to the controller.

All steps are chained with flatMap, preserving the non‑blocking nature of WebFlux and ensuring thread‑pool safety and high concurrency.

5. Other Protocols Besides SSE

For one‑way server‑to‑client pushes, SSE works well. For bidirectional communication (e.g., chat, collaborative editing), consider:

Technology

Direction

Recommended Use

Features

SSE

Server → Client

Lightweight push, progress notifications

HTTP‑compatible

WebSocket

Bidirectional

Instant messaging, collaborative editing

Real‑time, heavy interaction

gRPC Stream

Bidirectional

Internal service communication

High performance, non‑browser friendly

MQTT

Bidirectional

IoT, real‑time messaging

Publish/subscribe, lightweight

In most business scenarios, a combination of SSE and WebSocket provides sufficient real‑time capabilities.

Conclusion

While synchronous, blocking APIs are familiar, reactive streaming enables higher concurrency, lower resource consumption, and the ability to handle tens of thousands of requests with only a few threads, similar to Nginx’s efficiency. Mastering streaming programming shifts development mindset and improves system performance.

Javareactive-programmingserver-sent-eventsspring-webfluxStreaming API
Lin is Dream
Written by

Lin is Dream

Sharing Java developer knowledge, practical articles, and continuous insights into computer engineering.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.