Unlocking DeepSeek R1’s Chain‑of‑Thought: A Spring WebFlux Integration Guide
This article examines why mainstream AI frameworks like Spring AI and LangChain4j cannot fully support DeepSeek’s R1 model, explains its unique chain‑of‑thought response format and parameter constraints, and provides a complete Spring WebFlux‑based solution—including API calls, streaming handling, and response parsing—to preserve reasoning content.
DeepSeek recently released the R1 model, which is notable for its powerful chain‑of‑thought (CoT) capability. In practice, many developers find that mainstream AI frameworks such as Spring AI and LangChain4j do not fully support these features.
Why Existing Frameworks Fall Short
Official R1 Model Specifics
Although many tutorials show how to connect DeepSeek via the OpenAI adapters of Spring AI or LangChain4j, this approach has three critical problems:
Chain‑of‑thought content loss : The R1 model returns detailed reasoning in the
reasoning_contentfield, which existing frameworks completely ignore.
Changed response pattern : R1 first outputs a detailed thinking process and then the final result, leading to longer response times that require streaming output and a dedicated CoT UI to avoid a poor user experience.
Parameter restrictions : Parameters such as
temperature,
top_p,
presence_penalty, and
frequency_penaltycan be set but have no effect.
Framework Adaptation Status
Currently, mainstream AI frameworks have not provided official support for DeepSeek R1:
LangChain4j: No plan to support DeepSeek’s unique CoT features.
Spring AI: Only supports the standard OpenAI protocol and cannot handle R1’s special response format.
Because this situation is unlikely to change in the short term, the most reliable approach for developers is to call the API directly.
Ollama Deployment Special Handling
When deploying R1 privately with Ollama, the situation differs slightly:
<code>ollama run deepseek-r1:14b</code>Ollama wraps the chain‑of‑thought content in a
<think>tag inside the
contentfield to stay compatible with the OpenAI protocol, which adds extra token overhead in multi‑turn conversations.
<code>export function withMessageThought(message: ChatMessage, startTime?: number) {<br/> const content = message.content;<br/><br/> const thinkPattern = /<think>(.*?)</think>/s;<br/> const matches = content.match(thinkPattern);<br/><br/> if (matches) {<br/> const reasoning_content = matches[1].trim();<br/> return reasoning_content;<br/> }<br/><br/> return message;<br/>}<br/></code>Elegant Implementation Based on Spring WebFlux
Direct API calls are preferable for the R1 model. Using Spring WebFlux, we can retain the full CoT content and achieve high‑performance streaming.
Non‑blocking I/O
Netty provides asynchronous, non‑blocking network operations.
Threads are not blocked by long‑running API calls.
Efficient handling of many concurrent requests.
Reactive streams
Spring Boot WebClient simplifies the call flow.
Server‑Sent Events (SSE) enable real‑time data push.
Facilitates UI interaction for streaming output.
API Implementation
<code>@PostMapping(value = "/deepseek", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<Map<String, String>> chatStream(String inputPrompt) {
Map<String, Object> message = new HashMap<>();
message.put("role", "user");
message.put("content", inputPrompt);
Map<String, Object> requestBody = new HashMap<>();
requestBody.put("messages", List.of(message));
requestBody.put("stream", true);
requestBody.put("model", "deepseek-reason");
WebClient webClient = WebClient.builder()
.baseUrl("https://api.deepseek.com/v1")
.defaultHeader("Authorization", "Bearer ${DEEPSEEK_API_KEY}")
.build();
return webClient.post()
.uri("/chat/completions")
.bodyValue(requestBody)
.retrieve()
.bodyToFlux(JsonNode.class)
.map(this::parseDeepseekResponse)
.takeUntil(response -> response.containsKey("finish_reason"))
.onErrorResume(error -> Flux.just(Map.of("content", "API call error: " + error.getMessage())));
}
</code>Response Parsing
<code>private Map<String, String> parseDeepseekResponse(JsonNode response) {
JsonNode choices = response.get("choices");
Map<String, String> result = new HashMap<>();
if (choices != null && choices.isArray() && !choices.isEmpty()) {
JsonNode choice = choices.get(0);
JsonNode delta = choice.get("delta");
if (delta != null) {
result.put("content", Optional.ofNullable(delta.get("content")).map(JsonNode::asText).orElse(""));
result.put("reasoning_content", Optional.ofNullable(delta.get("reasoning_content")).map(JsonNode::asText).orElse(""));
}
Optional.ofNullable(choice.get("finish_reason"))
.filter(node -> !node.isNull())
.ifPresent(node -> result.put("finish_reason", node.asText()));
}
return result;
}
</code>Conclusion
By following the implementation above, developers can:
Fully preserve the R1 model’s chain‑of‑thought capability.
Leverage WebFlux for high‑performance streaming processing.
Since DeepSeek’s official API is often unavailable, an alternative is to use the 671B full‑strength R1 model deployed on SiliconFlow: https://cloud.siliconflow.cn/i/YKcJJTYP
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.