Artificial Intelligence 8 min read

Unlocking DeepSeek R1’s Chain‑of‑Thought: A Spring WebFlux Integration Guide

This article examines why mainstream AI frameworks like Spring AI and LangChain4j cannot fully support DeepSeek’s R1 model, explains its unique chain‑of‑thought response format and parameter constraints, and provides a complete Spring WebFlux‑based solution—including API calls, streaming handling, and response parsing—to preserve reasoning content.

Java Architecture Diary

Feb 5, 2025

Unlocking DeepSeek R1’s Chain‑of‑Thought: A Spring WebFlux Integration Guide

DeepSeek recently released the R1 model, which is notable for its powerful chain‑of‑thought (CoT) capability. In practice, many developers find that mainstream AI frameworks such as Spring AI and LangChain4j do not fully support these features.

Why Existing Frameworks Fall Short

Official R1 Model Specifics

Although many tutorials show how to connect DeepSeek via the OpenAI adapters of Spring AI or LangChain4j, this approach has three critical problems:

Chain‑of‑thought content loss : The R1 model returns detailed reasoning in the reasoning_content field, which existing frameworks completely ignore.

Changed response pattern : R1 first outputs a detailed thinking process and then the final result, leading to longer response times that require streaming output and a dedicated CoT UI to avoid a poor user experience.

Parameter restrictions : Parameters such as temperature, top_p, presence_penalty, and frequency_penalty can be set but have no effect.

Framework Adaptation Status

Currently, mainstream AI frameworks have not provided official support for DeepSeek R1:

LangChain4j: No plan to support DeepSeek’s unique CoT features.

Spring AI: Only supports the standard OpenAI protocol and cannot handle R1’s special response format.

Because this situation is unlikely to change in the short term, the most reliable approach for developers is to call the API directly.

Ollama Deployment Special Handling

When deploying R1 privately with Ollama, the situation differs slightly: ollama run deepseek-r1:14b Ollama wraps the chain‑of‑thought content in a <think> tag inside the content field to stay compatible with the OpenAI protocol, which adds extra token overhead in multi‑turn conversations.

export function withMessageThought(message: ChatMessage, startTime?: number) {<br/>  const content = message.content;<br/><br/>  const thinkPattern = /<think>(.*?)</think>/s;<br/>  const matches = content.match(thinkPattern);<br/><br/>  if (matches) {<br/>    const reasoning_content = matches[1].trim();<br/>    return reasoning_content;<br/>  }<br/><br/>  return message;<br/>}<br/>

Elegant Implementation Based on Spring WebFlux

Direct API calls are preferable for the R1 model. Using Spring WebFlux, we can retain the full CoT content and achieve high‑performance streaming.

Non‑blocking I/O

Netty provides asynchronous, non‑blocking network operations.

Threads are not blocked by long‑running API calls.

Efficient handling of many concurrent requests.

Reactive streams

Spring Boot WebClient simplifies the call flow.

Server‑Sent Events (SSE) enable real‑time data push.

Facilitates UI interaction for streaming output.

API Implementation

@PostMapping(value = "/deepseek", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<Map<String, String>> chatStream(String inputPrompt) {
    Map<String, Object> message = new HashMap<>();
    message.put("role", "user");
    message.put("content", inputPrompt);

    Map<String, Object> requestBody = new HashMap<>();
    requestBody.put("messages", List.of(message));
    requestBody.put("stream", true);
    requestBody.put("model", "deepseek-reason");

    WebClient webClient = WebClient.builder()
            .baseUrl("https://api.deepseek.com/v1")
            .defaultHeader("Authorization", "Bearer ${DEEPSEEK_API_KEY}")
            .build();

    return webClient.post()
            .uri("/chat/completions")
            .bodyValue(requestBody)
            .retrieve()
            .bodyToFlux(JsonNode.class)
            .map(this::parseDeepseekResponse)
            .takeUntil(response -> response.containsKey("finish_reason"))
            .onErrorResume(error -> Flux.just(Map.of("content", "API call error: " + error.getMessage())));
}

Response Parsing

private Map<String, String> parseDeepseekResponse(JsonNode response) {
    JsonNode choices = response.get("choices");
    Map<String, String> result = new HashMap<>();

    if (choices != null && choices.isArray() && !choices.isEmpty()) {
        JsonNode choice = choices.get(0);
        JsonNode delta = choice.get("delta");

        if (delta != null) {
            result.put("content", Optional.ofNullable(delta.get("content")).map(JsonNode::asText).orElse(""));
            result.put("reasoning_content", Optional.ofNullable(delta.get("reasoning_content")).map(JsonNode::asText).orElse(""));
        }

        Optional.ofNullable(choice.get("finish_reason"))
                .filter(node -> !node.isNull())
                .ifPresent(node -> result.put("finish_reason", node.asText()));
    }
    return result;
}

Conclusion

By following the implementation above, developers can:

Fully preserve the R1 model’s chain‑of‑thought capability.

Leverage WebFlux for high‑performance streaming processing.

Since DeepSeek’s official API is often unavailable, an alternative is to use the 671B full‑strength R1 model deployed on SiliconFlow: https://cloud.siliconflow.cn/i/YKcJJTYP

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

DeepSeek Chain-of-Thought spring-webflux R1

Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.