Backend Development 46 min read

Dissecting MCP Protocol: Scaling Java Microservices for AI‑Native Tooling

This article analyzes the Model Context Protocol (MCP), detailing its architecture, JSON‑RPC extensions, Streamable HTTP transport, and governance layers, and demonstrates how to transform high‑traffic Java microservices into a secure, observable AI‑native capability layer using an independent MCP gateway, tooling standards, and production‑grade implementations.

Ray's Galactic Tech

Apr 26, 2026

Dissecting MCP Protocol: Scaling Java Microservices for AI‑Native Tooling

1. Why enterprises need an MCP capability layer instead of a simple AI plugin

Many companies try to "plug a large model" into existing systems by adding a chat entry, an AI‑assistant button, or wrapping a few APIs as function calls. These demo‑level solutions break in production: duplicate integrations, lack of unified governance, unsafe operations, thread‑pool exhaustion, and prompt bloat. The root problem is the absence of a unified AI capability exposure protocol that clearly defines boundaries, governance, and evolution.

2. MCP protocol versus REST, OpenAPI, and Function Calling

2.1 Role model

MCP defines three participants:

Host : the client application (IDE, enterprise Copilot, chat assistant) that hosts the model.

Client : an MCP connector inside the Host that communicates with one or more MCP servers.

Server : the provider that exposes tools, resources, and prompt templates.

This separation ensures that the model never calls a business service directly, tools are not hard‑coded to a specific AI platform, and capabilities can be reused across multiple Hosts.

2.2 Core objects

Tools : executable actions callable by the model.

Resources : read‑only contextual data injected by the client.

Prompts : template‑driven interaction units, primarily user‑controlled.

2.3 JSON‑RPC 2.0 foundation with AI‑specific semantics

Initialization & capability negotiation.

Session management.

Dynamic tool discovery.

Unified access to resources and prompts.

Progress, cancellation, logging, and task handling.

Unlike REST, which only exposes resources, MCP focuses on exposing capabilities.

2.4 Lifecycle

initialize

notifications/initialized

Normal operation phase

Graceful shutdown or session termination

During initialization the client declares supported protocol version, capabilities, and implementation info; the server responds with its version and capabilities. Example request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2025-11-25",
    "capabilities": {},
    "clientInfo": {
      "name": "enterprise-copilot",
      "version": "1.0.0"
    }
  }
}

2.5 Tool definition requirements

Name

Description

Input schema

Optional output schema

Additional metadata

The model needs a contract that it can understand, not just a Java method signature.

2.6 Streamable HTTP (the production‑grade transport)

Single MCP endpoint. POST for request submission. GET for server‑side streaming.

Optional SSE for multiple messages.

Supports Mcp-Session-Id header for session correlation.

This design simplifies load‑balancing, ingress handling, and unified authentication compared with the older HTTP + SSE approach.

3. Pitfalls of exposing microservices directly as tools

Embedding an MCP SDK into each Spring Boot service leads to:

Scattered authentication and inconsistent permission checks.

Ungoverned tool naming, parameter definitions, and output formats.

Fragmented observability across dozens of services.

Expanded attack surface when LLMs call core services directly.

Complex rollout because every new tool requires code changes in the business service.

Therefore a tool is not a regular controller; it requires stronger governance.

3.1 Recommended approach: an independent MCP Gateway

The gateway aggregates downstream capabilities, exposing a unified MCP endpoint while handling:

Protocol adaptation.

Tool registration.

Authentication and session management.

Permission evaluation.

Risk control and audit logging.

Rate‑limiting and circuit breaking.

Result sanitization.

Asynchronous task orchestration.

This mirrors an API gateway but adds two AI‑specific layers: a tool‑semantic layer for the model and a security‑risk layer for AI‑driven operations.

4. Target architecture: upgrading a million‑concurrency Java microservice stack to an MCP service network

4.1 Business background

Tech stack: Spring Boot 3.x + Spring Cloud.

Service discovery: Nacos.

Communication: HTTP, gRPC, Kafka.

Storage: MySQL, Redis, Elasticsearch.

Deployment: Kubernetes.

Security: OAuth2 / JWT / RBAC.

4.2 Recommended layering

AI Host (Copilot, chat assistant)

MCP Client (SDK)

MCP Gateway (Spring Boot + WebFlux)

Tool Governance Layer (Auth / RBAC / Risk / Audit / Rate‑limit)

Tool Routing Layer (HTTP / gRPC / Kafka)

Downstream business services (Order, Risk, Customer, Approval)

Redis for session & idempotency

Nacos for tool registry

Observability (Prometheus, OpenTelemetry, ELK)

4.3 Design principles

Protocol decoupling : microservices keep their native HTTP/gRPC/MQ interfaces; only the gateway speaks MCP.

Centralized governance : description, permission, audit, rate‑limit, and circuit‑break are handled at the gateway, not in each service.

Asynchronous first : the gateway must be non‑blocking; otherwise AI traffic can saturate the thread pool.

Explicit high‑risk handling : write operations require approval, idempotency, and audit.

Dynamic capability : tool whitelist, routing, and risk levels are driven by configuration, not hard‑coded.

5. Critical call chain example: "Unfreeze mistakenly blocked orders"

"把昨天误伤、投诉等级高于 P2 的订单全部解冻，并说明原因。"

Host sends user request to the model.

Model calls tools/list and discovers searchRiskOrders and unfreezeOrderBatch.

Model invokes searchRiskOrders.

Gateway validates session, permission, risk level, and rate limits.

Gateway forwards the request to the risk service and receives candidate orders.

Gateway trims the result to only the fields the model needs.

Model builds a second tool call to unfreezeOrderBatch.

Gateway detects a high‑risk write, requires an approval ticket.

Gateway generates an idempotency key and writes an audit log.

Order service performs batch unfreeze and emits events.

Gateway aggregates structured results and returns them to the model.

Model translates the execution summary into natural language for the operator.

The focus is not whether the model can call a tool, but whether the tool is well‑governed, auditable, and safe.

6. Production‑grade implementation: Spring Boot + WebFlux + MCP Gateway

6.1 Project structure

mcp-gateway/
├── pom.xml
├── src/main/java/com/example/mcpgateway/
│   ├── McpGatewayApplication.java
│   ├── config/
│   │   ├── McpGatewayProperties.java
│   │   ├── WebClientConfig.java
│   │   └── ResilienceConfig.java
│   ├── auth/
│   │   ├── McpSecurityWebFilter.java
│   │   ├── SessionPrincipal.java
│   │   └── SessionContextHolder.java
│   ├── governance/
│   │   ├── ToolAccessEvaluator.java
│   │   ├── ToolAuditService.java
│   │   ├── ToolResultSanitizer.java
│   │   └── IdempotencyService.java
│   ├── registry/
│   │   ├── ToolRouteDefinition.java
│   │   └── ToolRouteRepository.java
│   ├── tools/
│   │   ├── OrderOperationTools.java
│   │   └── RiskOperationTools.java
│   ├── client/
│   │   ├── DownstreamGatewayClient.java
│   │   └── dto/
│   └── task/
│       ├── AsyncTaskFacade.java
│       └── TaskStatusResource.java
└── k8s/
    ├── deployment.yaml
    ├── service.yaml
    ├── hpa.yaml
    └── servicemonitor.yaml

6.2 Dependency selection

Use BOM management; core dependencies include:

spring-boot-starter-webflux (non‑blocking HTTP)

spring-ai-starter-mcp-server-webflux (MCP support)

spring-boot-starter-validation (parameter validation)

spring-boot-starter-actuator (observability)

spring-boot-starter-data-redis-reactive (session & idempotency)

resilience4j‑spring‑boot3 (rate limiting, bulkhead, circuit breaker)

micrometer‑registry‑prometheus & opentelemetry‑spring‑boot‑starter (metrics)

6.3 Configuration – keep governance out of code

spring:
  ai:
    mcp:
      server:
        name: mcp-gateway
        version: 1.0.0
        instructions: >
          This is the enterprise tool gateway. All write operations must obey permission, approval, idempotency, and audit rules.
        type: ASYNC
        sse-message-endpoint: /mcp
  data:
    redis:
      host: ${REDIS_HOST:redis}
      port: ${REDIS_PORT:6379}
server:
  port: 8080
mcp-gateway:
  security:
    api-key-header: X-MCP-API-Key
    allowed-origins:
      - https://copilot.example.com
      - https://ops-assistant.example.com
    session-ttl: 30m
  result:
    max-text-chars: 6000
    max-list-size: 50
  tools:
    approval-required-levels:
      - HIGH_RISK_WRITE
    routes:
      - name: searchRiskOrders
        description: Retrieve orders hit by risk control in the last 24 h
        downstreamService: risk-service
        path: /internal/risk/orders/search
        method: POST
        riskLevel: READ_ONLY
        timeoutMs: 800
      - name: unfreezeOrderBatch
        description: Batch unfreeze orders; requires approval ticket
        downstreamService: order-service
        path: /internal/orders/unfreeze/batch
        method: POST
        riskLevel: HIGH_RISK_WRITE
        timeoutMs: 1500
resilience4j:
  ratelimiter:
    instances:
      mcp-global:
        limitForPeriod: 3000
        limitRefreshPeriod: 1s
        timeoutDuration: 0
  bulkhead:
    instances:
      tool-write:
        maxConcurrentCalls: 200
        maxWaitDuration: 0
  circuitbreaker:
    instances:
      order-service:
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 100
        failureRateThreshold: 50
  timelimiter:
    instances:
      tool-default:
        timeoutDuration: 2s

Key points:

Tool metadata is configuration‑driven for easy gray‑release, limit adjustment, and risk level changes.

Risk level is a first‑class attribute, not just a description.

6.4 Security filter – origin check, session restoration, API‑key auth

package com.example.mcpgateway.auth;

import java.time.Duration;
import java.util.List;
import org.springframework.data.redis.core.ReactiveStringRedisTemplate;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.stereotype.Component;
import org.springframework.web.server.ServerWebExchange;
import org.springframework.web.server.WebFilter;
import org.springframework.web.server.WebFilterChain;
import reactor.core.publisher.Mono;
import reactor.util.context.Context;

@Component
public class McpSecurityWebFilter implements WebFilter {
    private static final String MCP_SESSION_ID = "Mcp-Session-Id";
    private final GatewaySecurityProperties properties;
    private final ApiKeyAuthService apiKeyAuthService;
    private final ReactiveStringRedisTemplate redisTemplate;
    private final SessionCodec sessionCodec;

    public McpSecurityWebFilter(GatewaySecurityProperties properties,
                               ApiKeyAuthService apiKeyAuthService,
                               ReactiveStringRedisTemplate redisTemplate,
                               SessionCodec sessionCodec) {
        this.properties = properties;
        this.apiKeyAuthService = apiKeyAuthService;
        this.redisTemplate = redisTemplate;
        this.sessionCodec = sessionCodec;
    }

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
        if (!exchange.getRequest().getPath().value().startsWith("/mcp")) {
            return chain.filter(exchange);
        }
        String origin = exchange.getRequest().getHeaders().getOrigin();
        List<String> allowedOrigins = properties.allowedOrigins();
        if (origin != null && !allowedOrigins.contains(origin)) {
            exchange.getResponse().setStatusCode(HttpStatus.FORBIDDEN);
            return exchange.getResponse().setComplete();
        }
        String apiKey = exchange.getRequest().getHeaders().getFirst(properties.apiKeyHeader());
        if (apiKey == null || apiKey.isBlank()) {
            exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
            return exchange.getResponse().setComplete();
        }
        String sessionId = exchange.getRequest().getHeaders().getFirst(MCP_SESSION_ID);
        Mono<SessionPrincipal> principalMono = (sessionId == null)
                ? apiKeyAuthService.authenticate(apiKey)
                : loadPrincipal(sessionId).switchIfEmpty(apiKeyAuthService.authenticate(apiKey));
        return principalMono.flatMap(principal ->
                chain.filter(exchange).contextWrite(Context.of(SessionContextHolder.PRINCIPAL_KEY, principal)));
    }

    private Mono<SessionPrincipal> loadPrincipal(String sessionId) {
        return redisTemplate.opsForValue()
                .get("mcp:session:" + sessionId)
                .map(sessionCodec::decode)
                .flatMap(principal -> redisTemplate.expire("mcp:session:" + sessionId, Duration.ofMinutes(30))
                        .thenReturn(principal));
    }
}

Design notes:

Origin validation protects against DNS‑rebinding attacks.

Session data lives in Redis to survive node failures.

Principal information is stored in Reactor context to survive thread switches.

6.5 Tool access evaluator – unified permission & approval checks

package com.example.mcpgateway.governance;

import com.example.mcpgateway.auth.SessionPrincipal;
import com.example.mcpgateway.registry.ToolRouteDefinition;
import org.springframework.stereotype.Component;

@Component
public class ToolAccessEvaluator {
    public void check(SessionPrincipal principal, ToolRouteDefinition route) {
        if (principal == null) {
            throw new AccessDeniedException("missing principal");
        }
        if (principal.readOnlyMode() && route.riskLevel() != ToolRouteDefinition.RiskLevel.READ_ONLY) {
            throw new AccessDeniedException("current session is read‑only");
        }
        if (route.riskLevel() == ToolRouteDefinition.RiskLevel.HIGH_RISK_WRITE) {
            if (!principal.hasRole("OPS_MANAGER") && !principal.hasRole("RISK_MANAGER")) {
                throw new AccessDeniedException("missing privileged role");
            }
            if (principal.approvalTicket() == null || principal.approvalTicket().isBlank()) {
                throw new AccessDeniedException("approval ticket required");
            }
        }
    }
}

6.6 Downstream client – unified timeout, resilience, audit, and sanitization

package com.example.mcpgateway.client;

import com.example.mcpgateway.auth.SessionPrincipal;
import com.example.mcpgateway.governance.IdempotencyService;
import com.example.mcpgateway.governance.ToolAuditService;
import com.example.mcpgateway.governance.ToolResultSanitizer;
import com.example.mcpgateway.registry.ToolRouteDefinition;
import io.github.resilience4j.reactor.bulkhead.operator.BulkheadOperator;
import io.github.resilience4j.reactor.circuitbreaker.operator.CircuitBreakerOperator;
import io.github.resilience4j.reactor.ratelimiter.operator.RateLimiterOperator;
import java.time.Duration;
import java.util.Map;
import org.springframework.http.MediaType;
import org.springframework.stereotype.Component;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;

@Component
public class DownstreamGatewayClient {
    private final WebClient webClient;
    private final ResilienceFacade resilienceFacade;
    private final ToolAuditService toolAuditService;
    private final ToolResultSanitizer toolResultSanitizer;
    private final IdempotencyService idempotencyService;

    public DownstreamGatewayClient(WebClient webClient,
                                 ResilienceFacade resilienceFacade,
                                 ToolAuditService toolAuditService,
                                 ToolResultSanitizer toolResultSanitizer,
                                 IdempotencyService idempotencyService) {
        this.webClient = webClient;
        this.resilienceFacade = resilienceFacade;
        this.toolAuditService = toolAuditService;
        this.toolResultSanitizer = toolResultSanitizer;
        this.idempotencyService = idempotencyService;
    }

    public Mono<String> invoke(ToolRouteDefinition route,
                               Map<String, Object> arguments,
                               SessionPrincipal principal,
                               boolean writeOperation) {
        String requestId = idempotencyService.computeRequestId(route.name(), principal.userId(), arguments);
        return webClient.post()
                .uri("lb://" + route.downstreamService() + route.path())
                .contentType(MediaType.APPLICATION_JSON)
                .header("X-User-Id", principal.userId())
                .header("X-Tenant-Id", principal.tenantId())
                .header("X-Request-Id", requestId)
                .header("X-Approval-Ticket", principal.approvalTicket() == null ? "" : principal.approvalTicket())
                .bodyValue(arguments)
                .retrieve()
                .bodyToMono(String.class)
                .timeout(Duration.ofMillis(route.timeoutMs()))
                .transformDeferred(RateLimiterOperator.of(resilienceFacade.globalRateLimiter()))
                .transformDeferred(CircuitBreakerOperator.of(resilienceFacade.circuitBreaker(route.downstreamService())))
                .transformDeferred(BulkheadOperator.of(writeOperation ? resilienceFacade.writeBulkhead() : resilienceFacade.readBulkhead()))
                .map(raw -> toolResultSanitizer.sanitize(route, raw))
                .doOnSuccess(result -> toolAuditService.success(requestId, route.name(), principal.userId(), arguments, result))
                .doOnError(error -> toolAuditService.failed(requestId, route.name(), principal.userId(), arguments, error));
    }
}

6.7 Result sanitizer – truncate, mask sensitive fields, enforce size limits

package com.example.mcpgateway.governance;

import com.example.mcpgateway.registry.ToolRouteDefinition;
import org.springframework.stereotype.Component;

@Component
public class ToolResultSanitizer {
    private static final int MAX_TEXT_CHARS = 6000;

    public String sanitize(ToolRouteDefinition route, String raw) {
        String masked = maskSensitiveFields(raw);
        if (masked.length() <= MAX_TEXT_CHARS) {
            return masked;
        }
        return masked.substring(0, MAX_TEXT_CHARS) + "

[Result truncated, please refine query]";
    }

    private String maskSensitiveFields(String raw) {
        return raw
                .replaceAll("(\"phone\"\s*:\s*\")([0-9]{3})[0-9]{4}([0-9]{4}\")", "$1$2****$3")
                .replaceAll("(\"idCard\"\s*:\s*\")([0-9]{6})[0-9]{8}([0-9]{4}\")", "$1$2********$3");
    }
}

6.8 Business tool example – model‑oriented signature and validation

package com.example.mcpgateway.tools;

import com.example.mcpgateway.auth.SessionContextHolder;
import com.example.mcpgateway.client.DownstreamGatewayClient;
import com.example.mcpgateway.governance.ToolAccessEvaluator;
import com.example.mcpgateway.registry.ToolRouteRepository;
import java.util.Map;
import org.springframework.ai.tool.annotation.Tool;
import org.springframework.ai.tool.annotation.ToolParam;
import org.springframework.stereotype.Component;
import reactor.core.publisher.Mono;

@Component
public class OrderOperationTools {
    private final ToolRouteRepository toolRouteRepository;
    private final ToolAccessEvaluator toolAccessEvaluator;
    private final DownstreamGatewayClient downstreamGatewayClient;

    public OrderOperationTools(ToolRouteRepository toolRouteRepository,
                               ToolAccessEvaluator toolAccessEvaluator,
                               DownstreamGatewayClient downstreamGatewayClient) {
        this.toolRouteRepository = toolRouteRepository;
        this.toolAccessEvaluator = toolAccessEvaluator;
        this.downstreamGatewayClient = downstreamGatewayClient;
    }

    @Tool(description = "Batch unfreeze orders. Only for orders confirmed as risk‑false‑positive; requires an approval ticket.")
    public Mono<String> unfreezeOrderBatch(
            @ToolParam(description = "List of order IDs, max 50") java.util.List<String> orderIds,
            @ToolParam(description = "Reason for unfreeze, must be business‑level") String reason,
            @ToolParam(description = "Approval ticket, e.g., APP‑20260426‑1001") String approvalTicket) {
        var ctx = SessionContextHolder.required();
        var route = toolRouteRepository.getRequired("unfreezeOrderBatch");
        toolAccessEvaluator.check(ctx.principal(), route);
        Map<String, Object> payload = Map.of(
                "orderIds", orderIds,
                "reason", reason,
                "approvalTicket", approvalTicket,
                "operator", ctx.principal().userId());
        return downstreamGatewayClient.invoke(route, payload, ctx.principal(), true);
    }
}

Key design points:

Tool description tells the model when and why to use it.

Parameters are model‑friendly (e.g., approvalTicket instead of opaque flags).

7. Asynchronous long‑running tasks

For operations that cannot finish within a single request (batch billing, export, reconciliation), MCP defines an experimental Tasks pattern: submit a task, receive a taskId, and poll for status via a Resource or second tool.

@Tool(description = "Create a reconciliation task and return taskId. Suitable for minute‑level jobs.")
public Mono<String> createReconciliationTask(
        @ToolParam(description = "Accounting date, e.g., 2026-04-25") String accountDate,
        @ToolParam(description = "Business line, e.g., mall, finance") String bizLine) {
    var ctx = SessionContextHolder.required();
    return asyncTaskFacade.createTask("reconciliation",
            Map.of("accountDate", accountDate,
                   "bizLine", bizLine,
                   "operator", ctx.principal().userId()))
            .map(taskId -> """
                Task submitted successfully.
                taskId: %s
                Use getTaskStatus to query progress.
                """.formatted(taskId));
}

8. High‑concurrency and scalability considerations

AI traffic differs from ordinary user traffic: bursts, amplified downstream calls, larger payloads, and complex retry chains. Five essential actions for the gateway:

Full async stack : WebFlux / Netty entry, WebClient downstream.

Isolation by risk level : separate read and write bulkheads, optionally per business domain.

Result trimming & pagination : default pagination, summary fields, automatic truncation, encourage the model to refine queries.

Idempotency for writes : deterministic key from tool name, operator, primary business key, and parameter hash.

Controlled caching : cache read‑heavy, write‑light tools with tenant and role dimensions.

Capacity example: 3 000 RPS external hosts × 2.4 tool calls per request ≈ 7 200 tool calls/sec. With 80 % reads and 20 % writes, the gateway must provision separate read/write pools, pre‑warm connections, and assume any downstream may fail.

9. Kubernetes deployment checklist

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-gateway
spec:
  replicas: 4
  selector:
    matchLabels:
      app: mcp-gateway
  template:
    metadata:
      labels:
        app: mcp-gateway
    spec:
      containers:
        - name: mcp-gateway
          image: registry.example.com/ai/mcp-gateway:1.0.0
          ports:
            - containerPort: 8080
          env:
            - name: JAVA_TOOL_OPTIONS
              value: "-XX:MaxRAMPercentage=75 -XX:+UseZGC"
          resources:
            requests:
              cpu: "1000m"
              memory: "1Gi"
            limits:
              cpu: "4000m"
              memory: "4Gi"
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-gateway-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-gateway
  minReplicas: 4
  maxReplicas: 30
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: mcp_active_requests
        target:
          type: AverageValue
          averageValue: "400"

In addition to CPU, HPA should watch custom MCP metrics such as active request count, average tool latency, and circuit‑breaker open ratio.

10. Security layers

The gateway enforces four layers:

Transport : TLS, Origin validation, session‑header checks.

Identity : API‑Key / OAuth / JWT, tenant and client identifiers.

Permission : RBAC / ABAC, approval tickets, read‑only mode, field‑level restrictions.

Semantic : lightweight prompt‑injection firewall that blocks high‑risk templates, abnormal bulk parameters, and role‑tool mismatches.

10.1 Prompt injection danger in MCP

Beyond data leakage, a crafted prompt can force the model to execute unauthorized writes, bypass approvals, or expand query scope. Therefore a semantic firewall must check for:

High‑risk action patterns.

Abnormally large parameter batches.

Role‑tool mismatch.

10.2 High‑risk tool confirmation

Risk levels dictate enforcement: READ_ONLY: callable directly. WRITE: requires role check and idempotency key. HIGH_RISK_WRITE: requires approval ticket or manual confirmation.

11. Observability and audit

Every tool call must record a rich audit record (traceId, sessionId, toolName, callerUserId, tenantId, clientName, riskLevel, requestSummary, downstreamService, latencyMs, resultSummary, errorCode, approvalTicket). Metrics to expose:

mcp_tool_calls_total

mcp_tool_latency_ms

mcp_tool_error_total

mcp_tool_denied_total

mcp_tool_truncated_total

mcp_tool_idempotent_hit_total

mcp_session_active

mcp_downstream_circuit_open_total

Replay capability is essential: store the full request chain (model‑selected tools, parameters, gateway sanitization, downstream responses, permission decisions) to enable root‑cause analysis.

12. Real‑world tooling pattern

Never expose a monolithic "fixWrongRiskOrders" tool. Split into three focused tools: searchRiskOrders – read‑only risk query. previewUnfreezePlan – returns a structured summary and next‑action hint. unfreezeOrderBatch – high‑risk write that requires approval.

Example JSON returned by previewUnfreezePlan (truncated for brevity):

{
  "summary": {"candidateCount":12,"eligibleCount":9,"rejectedCount":3},
  "eligibleOrders":[{"orderId":"O202604260001","reason":"Risk rule rolled back","riskScore":0.12}],
  "rejectedOrders":[{"orderId":"O202604260010","reason":"Missing approval ticket"}],
  "nextAction":"If you wish to proceed, call unfreezeOrderBatch with an approval ticket."
}

This design lets the model present a concise plan, obtain human confirmation, and then execute the write tool safely.

13. Common pitfalls and fixes

Tool description written for developers – rewrite to be model‑readable (e.g., "Batch unfreeze orders…").

Returning raw downstream JSON – summarize, mask, paginate, and suggest next steps.

Mixing read and write limits – separate bulkheads and quotas per risk level.

Letting the model decide approval – enforce approval as a system rule, not a model choice.

Only logging, no structured audit – store audit rows with searchable keys (toolName + orderId + operator + traceId).

14. Pre‑launch checklist

Protocol layer : initialize, tools/list, tools/call, session‑id renewal work as expected.

Security layer : Origin validation, API‑Key auth, RBAC, approval ticket enforcement, output sanitization.

Performance layer : full async flow, downstream timeouts, circuit breakers, result trimming.

Operations layer : tool‑level metrics, replay capability, graceful gray‑out of individual tools, load‑test baseline and capacity model.

15. Conclusion

MCP is more than a plug‑in; it is a systematic rewrite of how services expose capabilities to AI. By inserting an independent MCP gateway, Java teams can keep existing microservices untouched while adding a governed, observable, and secure AI‑native capability surface. The result is a production‑ready architecture where AI agents can safely discover, invoke, and audit business functions at massive scale.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java microservices MCP Observability gateway protocol AI-native

Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. Why enterprises need an MCP capability layer instead of a simple AI plugin

2. MCP protocol versus REST, OpenAPI, and Function Calling

2.1 Role model

2.2 Core objects

2.3 JSON‑RPC 2.0 foundation with AI‑specific semantics

2.4 Lifecycle

2.5 Tool definition requirements

2.6 Streamable HTTP (the production‑grade transport)

3. Pitfalls of exposing microservices directly as tools

3.1 Recommended approach: an independent MCP Gateway

4. Target architecture: upgrading a million‑concurrency Java microservice stack to an MCP service network

4.1 Business background

4.2 Recommended layering

4.3 Design principles

5. Critical call chain example: "Unfreeze mistakenly blocked orders"

6. Production‑grade implementation: Spring Boot + WebFlux + MCP Gateway

6.1 Project structure

6.2 Dependency selection

6.3 Configuration – keep governance out of code

6.4 Security filter – origin check, session restoration, API‑key auth

6.5 Tool access evaluator – unified permission & approval checks

6.6 Downstream client – unified timeout, resilience, audit, and sanitization

6.7 Result sanitizer – truncate, mask sensitive fields, enforce size limits

6.8 Business tool example – model‑oriented signature and validation

7. Asynchronous long‑running tasks

8. High‑concurrency and scalability considerations

9. Kubernetes deployment checklist

10. Security layers

10.1 Prompt injection danger in MCP

10.2 High‑risk tool confirmation

11. Observability and audit

12. Real‑world tooling pattern

13. Common pitfalls and fixes

14. Pre‑launch checklist

15. Conclusion

Ray's Galactic Tech

How this landed with the community

Was this worth your time?

0 Comments

6. Production‑grade implementation: Spring Boot + WebFlux + MCP Gateway