Mastering Reactive Streaming with Spring WebFlux: Build Real‑Time APIs Like DeepSeek
This article explains how traditional synchronous JSON responses differ from modern streaming responses, introduces reactive programming concepts, and shows step‑by‑step how to implement non‑blocking, high‑throughput APIs using Spring WebFlux, Mono/Flux, SSE, WebSocket, and related protocols.
In recent years, most web interfaces return a complete JSON payload after the server finishes processing, following the classic B/S model where the controller synchronously builds the response and returns it once all data is ready.
However, large‑model chat services such as DeepSeek deliver responses token by token, a behavior enabled by reactive programming where the server pushes data as soon as it becomes available.
Unlike the traditional MVC approach, reactive streams do not wait for full processing; they start sending data once it is prepared, establishing a connection that allows the server to push incremental updates.
Having worked with Spring MVC and Tomcat, I later encountered Netty for real‑time messaging and discovered Spring WebFlux, a framework designed for reactive, non‑blocking streams.
1. What Is a Streaming Response?
Streaming responses use HTTP chunked transfer encoding, SSE, or WebSocket so the server can send data progressively without waiting for the entire result.
This approach, favored by AI models, simulates human typing, reduces latency, saves compute resources, and improves concurrency; the content type is typically text/event-stream rather than application/json.
2. How to Write a Streaming API in a Spring Project?
Traditional Spring MVC can use SseEmitter with asynchronous threads, but the code is cumbersome and blocks threads.
Spring WebFlux with Flux natively supports non‑blocking streaming, allowing the server to push data as it is processed.
Spring WebFlux, introduced in Spring Framework 5.0, is built on Project Reactor, implements the Reactive Streams specification, and supports asynchronous, non‑blocking I/O on servers like Netty. It natively handles Server‑Sent Events (SSE) and WebSocket, making it ideal for high‑concurrency, I/O‑intensive scenarios.
Example controller method:
@GetMapping("/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<Item> streamItems() {
return service.streamItemsFromDb();
}Front‑end can consume the stream using the native EventSource API.
3. Where Can This Pattern Be Applied?
Scenario
Old Approach
New Approach
AI text generation
Wait for full result (spinning)
Show partial output as it arrives
Data dashboard
Aggregate multiple calls then render
Load data incrementally, page by page
Background tasks
Polling
Push execution results in real time
Paginated product data
Calculate total count first
Load continuously using a cursor
4. How to Call Third‑Party Streaming APIs from Java?
Traditional RestTemplate is synchronous and blocks the calling thread. WebClient (org.springframework.web.reactive.function.client.WebClient) is the reactive, non‑blocking HTTP client in Spring WebFlux, supporting asynchronous streaming and replacing RestTemplate.
Typical flow
Call a third‑party streaming endpoint (e.g., a direct‑charge API).
Receive the response (JSON), check success.
Transform the response into database update statements.
Invoke MyBatis updateDetail(...) to persist changes.
Return a Mono to the controller.
All steps are chained with flatMap, preserving the non‑blocking nature of WebFlux and ensuring thread‑pool safety and high concurrency.
5. Other Protocols Besides SSE
For one‑way server‑to‑client pushes, SSE works well. For bidirectional communication (e.g., chat, collaborative editing), consider:
Technology
Direction
Recommended Use
Features
SSE
Server → Client
Lightweight push, progress notifications
HTTP‑compatible
WebSocket
Bidirectional
Instant messaging, collaborative editing
Real‑time, heavy interaction
gRPC Stream
Bidirectional
Internal service communication
High performance, non‑browser friendly
MQTT
Bidirectional
IoT, real‑time messaging
Publish/subscribe, lightweight
In most business scenarios, a combination of SSE and WebSocket provides sufficient real‑time capabilities.
Conclusion
While synchronous, blocking APIs are familiar, reactive streaming enables higher concurrency, lower resource consumption, and the ability to handle tens of thousands of requests with only a few threads, similar to Nginx’s efficiency. Mastering streaming programming shifts development mindset and improves system performance.
Lin is Dream
Sharing Java developer knowledge, practical articles, and continuous insights into computer engineering.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
