37 min read

Mastering Spring AI MCP: Bidirectional Communication, Four Providers, Sampling Callbacks, and Dual‑Mode Deployment

This article explains why traditional function‑calling is insufficient for production AI services and shows how Spring AI's Model Context Protocol (MCP) introduces bidirectional communication, addressable resources, parameterized prompts, tool orchestration, and server‑initiated sampling, providing a complete roadmap to build a production‑grade AI microservice architecture.

Ray's Galactic Tech

Apr 25, 2026

Mastering Spring AI MCP: Bidirectional Communication, Four Providers, Sampling Callbacks, and Dual‑Mode Deployment

Why Function Calling Is No Longer Enough

Many teams start AI projects with a simple request‑gateway → model → tool → JSON response flow, which works for short‑lived, low‑context scenarios but quickly shows problems in production:

Context management loss : product knowledge, rules, FAQs, images, and policies are manually stitched into prompts, leading to high token consumption and inconsistent updates.

One‑way link : tools can only be called; they cannot inform the client that they are processing, need more reasoning, or want the server to invoke another model capability.

No protocol‑level resource governance : static text, dynamic config, binary files, and templates lack a unified addressing scheme, making caching, subscription, and version control difficult.

Service orchestration fragmentation : although microservices know the context, result, and exceptions, the client must still re‑initiate a model call for every summarisation, polishing, or upgrade decision.

Model capabilities cannot flow back to the server : after a tool returns data, the server often wants to summarise, rewrite, translate, or extract structure, but the traditional chain does not allow the server to request the client to perform additional sampling.

These limitations are exactly where MCP adds value.

What MCP Actually Changes

2.1 From Single Call to Bidirectional Conversation

Traditional function calling resembles an "HTTP API + JSON Schema" model, while MCP follows a "stateful protocol + capability negotiation + bidirectional messaging" approach. In one MCP session the client and server first negotiate capabilities, then enter a continuous communication phase where the client can request resources, prompts, and tools, and the server can proactively send notifications or issue sampling/createMessage requests that make the client invoke its connected LLM for inference.

Server is no longer a passive function executor.

Client is no longer just a request initiator.

Model capabilities become infrastructure‑level services that can flow across boundaries.

2.2 From Prompt Concatenation to Context Governance

Resources in MCP are addressable context assets. Instead of embedding product descriptions, policies, or logistics templates directly into prompts, they become URIs such as:

product://sku/10001

policy://refund/v2

faq://delivery/delay

file://campaign/poster-2026-04

Once protocol‑ized, resources can be cached, versioned, permission‑controlled, and multi‑tenant isolated.

2.3 From Tool Result to Business‑Closed Loop

Production systems care about the whole process, not just the final tool output. Important process signals include task start, current step, user‑visible wait time, downstream exception handling, and whether a second model call is needed to turn raw data into user‑readable content. MCP’s notification and sampling mechanisms expose these signals without forcing all logic back to the client.

Spring AI MCP in the Architecture

Spring AI embeds MCP capabilities into the Spring Boot model, making MCP behave like a regular Spring component that can be assembled, deployed, governed, observed, rate‑limited, and audited just like any other microservice.

Two implementation styles are supported:

Annotation‑based exposure : easier for day‑to‑day business development and maintenance.

Explicit Provider/Spec registration : more flexible for framework extensions or deep customisation.

The article focuses on the four Provider capabilities (Resources, Prompts, Tools, Sampling) with concrete annotation‑based code examples.

Production‑Grade Goals: What We Need to Build

Using an e‑commerce intelligent‑customer‑service scenario, the MCP microservice must provide:

External exposure of product data, after‑sale policies, and SLA information as resources.

Prompt‑based capabilities for order nudging, refund judgement, and logistics queries.

Progress push during long‑running queries.

Server‑initiated sampling to let the client generate soothing replies or summaries.

Both Streamable HTTP and Stdio transport modes.

Stable operation in local development, containerised, and Kubernetes environments.

High‑concurrency support with caching, rate‑limiting, retries, idempotency, observability, and security controls.

Core Principles of the Four Capabilities

5.1 Resources – Context Asset Layer

Resources hold product details, rules, campaign descriptions, FAQs, and binary metadata. They should not be coupled directly to database tables; instead they are served through a "Context Asset Service" that aggregates upstream data sources (MySQL, CMS, object storage), applies caching, versioning, and permission checks, and finally exposes a URI‑based read API.

5.2 Prompts – Business Strategy Layer

Prompts become shared, versioned, and configurable templates. Benefits include cross‑team collaboration, tenant/brand/language‑specific variations, and integration with gray‑release, A/B testing, and rollback pipelines.

5.3 Tools – Business Capability Orchestration

A production‑grade Tool must handle parameter validation, idempotency, downstream timeouts, circuit‑breaker degradation, result standardisation, progress notification, audit logging, and failure recovery.

5.4 Sampling – Cross‑Boundary Inference Orchestration

Sampling lets the server request the client’s model to generate a response based on structured data (e.g., logistics delay risk). This moves the model execution responsibility to the client while the server retains business logic and context.

Project Skeleton: Dependencies, Modules, and Configuration

6.1 Maven Dependencies

<properties>
    <java.version>17</java.version>
    <spring-ai.version>1.1.4</spring-ai.version>
</properties>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webflux</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-mcp-server-webflux</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-validation</artifactId>
    </dependency>
    <dependency>
        <groupId>io.github.resilience4j</groupId>
        <artifactId>resilience4j-spring-boot3</artifactId>
    </dependency>
    <dependency>
        <groupId>com.github.ben-manes.caffeine</groupId>
        <artifactId>caffeine</artifactId>
    </dependency>
</dependencies>

6.2 Module Layout

com.example.mcp
├── config               # MCP, cache, rate‑limit, observability config
├── controller           # Optional management endpoints
├── mcp
│   ├── resource         # Resource Provider layer
│   ├── prompt           # Prompt Provider layer
│   ├── tool             # Tool Provider layer
│   └── sampling         # Sampling helper logic
├── application         # Application‑level orchestration
├── domain              # Core business models
├── infrastructure
│   ├── order           # Order, logistics downstream clients
│   ├── cache           # Cache implementations
│   ├── audit           # Auditing utilities
│   └── metrics         # Metrics & tracing
└── support              # DTOs, exceptions, utilities

The key idea is that MCP is an exposure layer, not the business layer itself.

Four Provider Implementations

7.1 Resource Provider – Addressable Product & Policy Resources

A domain service aggregates product and policy data, caches it with Caffeine, and formats a multi‑line string.

package com.example.mcp.application;

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import org.springframework.stereotype.Service;
import java.time.Duration;

@Service
public class ProductContextService {
    private final ProductRepository productRepository;
    private final PolicyRepository policyRepository;
    private final Cache<String, String> resourceCache = Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofMinutes(10))
            .build();

    public ProductContextService(ProductRepository productRepository, PolicyRepository policyRepository) {
        this.productRepository = productRepository;
        this.policyRepository = policyRepository;
    }

    public String loadProductContext(String skuId) {
        return resourceCache.get("product:" + skuId, key -> {
            Product product = productRepository.findBySkuId(skuId)
                    .orElseThrow(() -> new IllegalArgumentException("商品不存在: " + skuId));
            return """
                SKU: %s
                名称: %s
                类目: %s
                卖点: %s
                售价: %s
                库存: %s
                售后规则: %s
                """.formatted(
                    product.skuId(),
                    product.name(),
                    product.category(),
                    product.highlights(),
                    product.price(),
                    product.stock(),
                    product.afterSalePolicy()
                );
        });
    }

    public String loadRefundPolicy(String tenantId) {
        return resourceCache.get("policy:" + tenantId, key ->
                policyRepository.findRefundPolicyByTenant(tenantId)
                        .orElse("默认售后政策: 7天无理由，特殊品类以实际规则为准"));
    }
}

Exposing the resources with MCP annotations:

package com.example.mcp.mcp.resource;

import com.example.mcp.application.ProductContextService;
import org.springframework.ai.mcp.server.annotation.McpResource;
import org.springframework.stereotype.Component;

@Component
public class CommerceResources {
    private final ProductContextService productContextService;

    public CommerceResources(ProductContextService productContextService) {
        this.productContextService = productContextService;
    }

    @McpResource(uri = "product://sku/{skuId}", name = "product-detail", description = "读取商品详情、卖点、库存与售后规则")
    public String productDetail(String skuId) {
        return productContextService.loadProductContext(skuId);
    }

    @McpResource(uri = "policy://refund/{tenantId}", name = "refund-policy", description = "读取租户维度的退款与退货政策")
    public String refundPolicy(String tenantId) {
        return productContextService.loadRefundPolicy(tenantId);
    }
}

Benefits include URI‑driven access semantics, removal of manual prompt stitching, on‑demand reads, and natural fit for caching, versioning, and permission control.

7.2 Prompt Provider – Governable Templates

Prompts are stored outside code (e.g., config centre or DB) and versioned. Example of an urge‑order reply template:

package com.example.mcp.mcp.prompt;

import org.springframework.ai.mcp.server.annotation.McpPrompt;
import org.springframework.stereotype.Component;

@Component
public class CustomerServicePrompts {
    @McpPrompt(name = "urge-order-reply", description = "根据订单状态生成催单安抚回复")
    public String urgeOrderReply(String orderId, String orderStatus, String tenantTone) {
        return """
            你是一名资深电商客服，请根据以下信息生成一段可直接发送给用户的回复：
            
            订单号：%s
            当前状态：%s
            语气要求：%s
            
            输出要求：
            1. 先共情，再解释当前进度
            2. 不承诺无法保证的时效
            3. 若出现履约异常，给出下一步处理建议
            4. 控制在120字内
            """.formatted(orderId, orderStatus, tenantTone);
    }
}

Production recommendations: store prompts in a config centre or database, version them, include them in gray‑release pipelines, and keep an offline evaluation set.

7.3 Tool Provider – Production‑Ready Order Query

The tool aggregates order and logistics data, adds validation, timeout, progress reporting, circuit‑breaker, and fallback logic.

package com.example.mcp.mcp.tool;

import com.example.mcp.application.OrderQueryApplicationService;
import com.example.mcp.domain.OrderDetailView;
import jakarta.validation.constraints.NotBlank;
import org.springframework.ai.mcp.server.annotation.McpTool;
import org.springframework.ai.mcp.server.transport.McpAsyncRequestContext;
import org.springframework.stereotype.Component;
import reactor.core.publisher.Mono;
import java.time.Duration;
import java.util.Map;

@Component
public class OrderTools {
    private final OrderQueryApplicationService orderQueryApplicationService;

    public OrderTools(OrderQueryApplicationService orderQueryApplicationService) {
        this.orderQueryApplicationService = orderQueryApplicationService;
    }

    @McpTool(name = "query-order", description = "查询订单详情、履约状态、物流节点与异常原因")
    public Mono<Map<String, Object>> queryOrder(@NotBlank String orderId, McpAsyncRequestContext context) {
        return context.info("开始查询订单: " + orderId)
                .then(context.reportProgress(10, 100, "已接收查询请求"))
                .then(orderQueryApplicationService.query(orderId)
                        .timeout(Duration.ofSeconds(3))
                        .flatMap(detail -> buildResult(detail, context))
                        .onErrorResume(ex -> fallback(orderId, ex, context)));
    }

    private Mono<Map<String, Object>> buildResult(OrderDetailView detail, McpAsyncRequestContext context) {
        return context.reportProgress(70, 100, "已完成订单与物流聚合")
                .then(context.info("订单查询成功: " + detail.orderId()))
                .thenReturn(Map.of(
                        "success", true,
                        "orderId", detail.orderId(),
                        "status", detail.status(),
                        "shipmentStatus", detail.shipmentStatus(),
                        "promisedAt", detail.promisedAt(),
                        "latestTrackingNode", detail.latestTrackingNode(),
                        "delayRisk", detail.delayRisk(),
                        "customerVisibleMessage", detail.customerVisibleMessage()))
                .delayUntil(result -> context.reportProgress(100, 100, "查询完成"));
    }

    private Mono<Map<String, Object>> fallback(String orderId, Throwable ex, McpAsyncRequestContext context) {
        return context.warning("订单查询异常: " + ex.getMessage())
                .thenReturn(Map.of(
                        "success", false,
                        "orderId", orderId,
                        "status", "UNKNOWN",
                        "errorCode", "ORDER_QUERY_DEGRADED",
                        "message", "订单系统繁忙，请稍后重试"));
    }
}

The accompanying application service adds resilience annotations (Retry, CircuitBreaker, TimeLimiter) and merges order and logistics data.

7.4 Sampling Provider – Server‑Side Reverse Inference

When structured data (e.g., logistics delay) is not user‑ready, the server triggers a sampling request so the client’s model can generate a natural‑language reply.

package com.example.mcp.application;

import org.springframework.ai.mcp.server.transport.McpAsyncRequestContext;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Mono;
import java.util.Map;

@Service
public class OrderReplyOrchestrator {
    public Mono<Map<String, Object>> enrichCustomerReply(Map<String, Object> toolResult, McpAsyncRequestContext context) {
        Boolean success = (Boolean) toolResult.get("success");
        if (Boolean.FALSE.equals(success)) {
            return Mono.just(toolResult);
        }
        return context.sample("""
                你是一名电商客服专家，请基于以下结构化信息生成一段发给用户的话术：
                %s
                
                要求：
                1. 语气真诚专业
                2. 先告知当前进度，再说明风险
                3. 如有延迟风险，给出安抚与后续动作
                4. 不超过120字
                """.formatted(toolResult))
                .map(reply -> Map.of(
                        "success", true,
                        "toolResult", toolResult,
                        "customerReply", reply))
                .onErrorResume(ex -> Mono.just(Map.of(
                        "success", true,
                        "toolResult", toolResult,
                        "customerReply", toolResult.get("customerVisibleMessage"),
                        "samplingFallback", true)));
    }
}

This design makes the server own business context while the client owns model execution, achieving clean separation of concerns.

Client‑Side Sampling Handling and Model Routing

The client receives a sampling/createMessage request and decides which model to use based on risk, cost, or availability.

package com.example.client.mcp;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.mcp.client.annotation.McpSampling;
import org.springframework.stereotype.Component;
import reactor.core.publisher.Mono;

@Component
public class CustomerServiceSamplingHandler {
    private final ChatClient premiumChatClient;
    private final ChatClient standardChatClient;
    private final ChatClient localChatClient;

    public CustomerServiceSamplingHandler(ChatClient premiumChatClient, ChatClient standardChatClient, ChatClient localChatClient) {
        this.premiumChatClient = premiumChatClient;
        this.standardChatClient = standardChatClient;
        this.localChatClient = localChatClient;
    }

    @McpSampling
    public Mono<String> handle(String prompt) {
        return routeByPolicy(prompt)
                .prompt()
                .user(prompt)
                .call()
                .content()
                .onErrorResume(ex -> localChatClient.prompt().user(prompt).call().content());
    }

    private ChatClient routeByPolicy(String prompt) {
        if (prompt.contains("赔付") || prompt.contains("投诉")) {
            return premiumChatClient;
        }
        if (prompt.length() > 500) {
            return standardChatClient;
        }
        return localChatClient;
    }
}

Three routing strategies are recommended: business‑risk routing, cost‑based routing, and availability‑based fallback.

Dual‑Mode Deployment: Streamable HTTP vs Stdio

Spring AI MCP can run over two transport layers:

Streamable HTTP : suited for microservices, containers, and Kubernetes clusters.

Stdio : ideal for local CLI tools, IDE plugins, or lightweight agents.

Configuration snippets (YAML for HTTP, command‑line flag for Stdio) illustrate how to enable each mode and what operational knobs (connection timeout, buffer size, request size, TLS termination) need attention.

Engineering Upgrades for High Concurrency and Operability

Key recommendations:

Use reactive WebFlux + Reactor to avoid thread‑pool exhaustion caused by long‑lived connections, tool aggregation, and sampling callbacks.

Separate connection state (handled by an ingress gateway) from business state (stored in Redis, DB, or message queues) to keep the core service stateless and horizontally scalable.

Implement seven governance points: resource caching, tool timeout & isolation, throttled progress notifications, sampling budget control, idempotency keys, structured audit logs, and graceful degradation.

Security, Governance, and Rate Limiting

Before production, enforce:

Tenant isolation, user‑identity mapping, resource ACLs, tool whitelists, and high‑risk tool confirmations.

Prompt injection protection, sensitive‑word filtering, PII redaction, and output safety checks.

Rate limits at session, tenant, and per‑Tool/Sampling levels to prevent cost explosions.

Deployment Recommendations

Dockerfile uses Eclipse Temurin 17 JRE, and a Kubernetes Deployment with three replicas, health probes, resource requests/limits, and profile‑driven transport selection (HTTP vs Stdio). Additional advice includes decoupling ingress from business logic, pulling configuration from a central config centre, and defining observability dashboards for session metrics, resource reads, tool latency, sampling latency, and error rates.

Common Pitfalls

Treating MCP as just another function‑calling wrapper and ignoring Resources, Prompts, and Sampling.

Reading every resource directly from the database without caching.

Missing timeout and isolation for Tools, causing whole sessions to hang.

Over‑frequent progress notifications that overload the client.

Running Sampling without budget control, leading to cost overruns.

Hard‑coding Prompts in code, preventing gray‑release and versioning.

Neglecting permission checks and audit trails for sensitive operations.

Focusing only on functionality and forgetting session topology and connection‑state management.

Final Architectural Takeaways

When MCP is viewed merely as a new way to call functions, it yields only a modest demo. When integrated into a microservice architecture, MCP fundamentally reshapes three aspects:

Context moves from ad‑hoc prompt stitching to protocol‑governed assets.

Tools evolve from passive functions to orchestrated, observable, and degradable business capabilities.

Model execution shifts from a client‑only concern to a collaborative, server‑initiated inference capability.

These changes enable truly bidirectional, governable, and scalable AI services where both server and client participate as first‑class collaborators.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java microservices AI MCP Spring Boot Spring AI bidirectional communication

Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.