Why Hand‑Crafted HTTP Calls to LLMs Are a Pitfall and How Spring AI Solves It
The article analyzes the hidden dangers of writing raw HTTP calls for large language models in Java projects—hard‑coded keys, fragile request bodies, missing retries, no observability—and demonstrates how Spring AI’s unified abstractions, built‑in resilience, streaming, function calling, and seamless Spring integration eliminate these issues while enabling effortless model switching and production‑grade AI services.
Introduction
A colleague asked why many tutorials use RestTemplate to call OpenAI or DeepSeek APIs directly, and the author points out that such "hand‑written HTTP" approaches become a nightmare when the model provider changes or the application scales.
1. Problems with Hand‑Written HTTP
Typical code for calling DeepSeek with RestTemplate looks simple but hides six critical issues:
Hard‑coded API keys – committing the key to source control exposes secrets.
Hard‑coded request parameters – each model uses different fields ( messages vs input), requiring code changes for every switch.
Poor exception handling – any error crashes the service; no retry or fallback.
No connection‑pool management – a new RestTemplate per request creates excessive TCP connections.
Fragile response parsing – a slight JSON change throws exceptions.
Lack of observability – no logging, metrics, or tracing for troubleshooting.
Attempting to wrap this logic in a custom LLMClient quickly balloons to hundreds of lines, covering thread‑safe HTTP clients, retry policies, circuit breakers, rate limiters, connection pools, metrics, and logging.
2. Spring AI – A Unified "Swiss‑Army Knife" for LLMs
Spring AI, officially released in 2024, provides a standard, modular AI toolchain for Java developers. It offers:
Model abstraction layer – interfaces like ChatModel and EmbeddingModel hide vendor‑specific details.
Adapter layer – implementations for OpenAI, Azure OpenAI, Anthropic, Alibaba Tongyi Qianwen, Ollama, etc., allow zero‑code model swaps.
Service orchestration layer – ChatClient fluent API, PromptTemplate, and conversation management simplify calls.
Application integration layer – @EnableAI auto‑configures beans like AiClient and ModelRegistry, integrating with Spring Security.
The four‑layer architecture is illustrated in the diagram below:
2.1 Unified Configuration
Switching models only requires editing application.yml – no code changes. For example, changing from OpenAI to DeepSeek or a local Ollama model is a matter of updating the spring.ai section.
# OpenAI
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
model: gpt-4-turbo
# DeepSeek
spring:
ai:
deepseek:
api-key: ${DEEPSEEK_API_KEY}
model: deepseek-chat
# Ollama (local Llama 3)
spring:
ai:
ollama:
base-url: http://localhost:11434
model: llama3:8bAPI keys are injected from environment variables, eliminating hard‑coding.
2.2 ChatClient – Fluent API
@Service
public class AIChatService {
private final ChatClient chatClient;
public String chat(String question) {
return chatClient.prompt(question).call().content();
}
}Only three lines replace the dozens required by raw RestTemplate.
2.3 Function Calling
Spring AI lets large models invoke business code via the @Function annotation. An example that queries an order service:
@Component
@Description("Query order details by orderId")
public class OrderQueryFunction implements Function<OrderQueryFunction.Request, OrderQueryFunction.Response> {
@Autowired
private OrderService orderService;
public record Request(@JsonProperty(required = true) String orderId) {}
public record Response(String status, BigDecimal amount, String deliveryTime) {}
@Override
public Response apply(Request request) {
Order order = orderService.getOrder(request.orderId());
return new Response(order.getStatus(), order.getAmount(), order.getDeliveryTime());
}
}Registering the function with ChatClient enables the model to call it automatically.
2.4 Streaming Responses
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestParam String prompt) {
return chatClient.prompt(prompt).stream()
.map(chunk -> chunk.getResult().getOutput().getContent());
}The stream() method handles Server‑Sent Events (SSE) with a single line of code.
2.5 Structured Output
Using BeanOutputConverter and Java record types, JSON responses are automatically mapped to strong‑typed objects:
public record SentimentResult(String label, double score) {}
BeanOutputConverter<SentimentResult> converter = new BeanOutputConverter<>(SentimentResult.class);
String json = chatClient.prompt("User comment: " + review + ". Return JSON.").call().content();
SentimentResult result = converter.convert(json);2.6 Retrieval‑Augmented Generation (RAG)
Spring AI provides a VectorStore abstraction that works with Elasticsearch, Milvus, PgVector, etc., plus a DocumentReader that parses PDFs, Word, Excel files. After configuring a vector store, a single call to retrieveAndGenerate() performs semantic search and generation.
3. Engineering Benefits vs Hand‑Written HTTP
Multi‑model support – change the provider by editing configuration only.
Built‑in resilience – retry, fallback, circuit breaker, rate limiting are provided out of the box.
Streaming – one‑line stream() replaces complex SSE handling.
Structured output – automatic JSON‑to‑Java conversion.
Function calling – declarative registration eliminates custom routing logic.
RAG – vector store integration and document ingestion are handled by the framework.
In contrast, a hand‑rolled solution would require hundreds of lines for each of these capabilities, and each piece would need thorough production testing.
4. Limitations and When to Use Spring AI
Current limitations include a learning curve for the new abstractions, rapid version changes, and a smaller ecosystem of ready‑made agents compared with projects like LangChain4j.
Ideal scenarios for Spring AI :
Existing Spring Boot microservices where developers want to add AI without rewriting the architecture.
Enterprises that need to switch between multiple LLM providers seamlessly.
Production environments requiring retries, circuit breaking, rate limiting, and observability.
Use cases where AI must call existing business services (e.g., order lookup, inventory check).
Regulated industries needing audit logs, security, and compliance.
Scenarios to avoid :
Quick one‑off demos where a few HTTP calls suffice.
Complex multi‑tool agent workflows where LangChain4j offers richer plugins.
Edge‑device or ultra‑lightweight deployments where the Spring footprint is too large.
Conclusion
For simple proofs of concept, hand‑written HTTP is fast and lightweight. However, for any production‑grade Java application that will grow, Spring AI provides a unified abstraction, deep Spring ecosystem integration, and out‑of‑the‑box engineering features that turn AI from a fragile add‑on into a robust, maintainable service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
