Artificial Intelligence 15 min read

Engineering Large Language Models with Spring AI: From Basics to RAG and Function Calls

This article walks through the fundamentals of large language models, their stateless and structured-output nature, explains how Spring‑AI provides a Java‑friendly API for model integration, covers RAG architecture, the MCP protocol, and demonstrates end‑to‑end code examples for building intelligent agents.

DeWu Technology

Jun 25, 2025

Engineering Large Language Models with Spring AI: From Basics to RAG and Function Calls

Overview

With the performance improvements and cost reductions of large models, they are increasingly integrated into traditional business scenarios beyond web chat.

What Is a Large Model?

Large models are massive mathematical formulas trained on massive text corpora; they behave like pure functions without state, requiring the caller to provide all necessary context each time.

Characteristics of Large Models

Stateless : each request is independent.

Structured Output : can return formats such as JSON, not just free‑form text.

Function Calling : can suggest and invoke external functions based on the conversation.

Large Model Interface

Models are typically offered as SaaS services due to their heavy hardware requirements, similar to how databases are deployed separately for resource needs.

RAG Architecture

When private data is needed, a knowledge base (e.g., MySQL, Elasticsearch, vector DB) is queried first, and the retrieved documents are combined with the user query before sending to the model.

MCP Protocol

The MCP protocol defines a simple, community‑agreed format for system‑to‑system calls, enabling large models to act as planning engines that invoke functions as “hands” and “feet”.

Spring‑AI

Spring‑AI offers a set of Java APIs compatible with the Spring ecosystem, providing model abstraction, chat sessions, and built‑in RAG support.

Model Abstraction

The core API is Model, a generic pure function that takes a request and returns a response. Sub‑interfaces such as ChatModel specialize for conversational use cases.

Chat Session

Spring‑AI provides ChatClient for managing conversations, handling history, and integrating with RAG advisors.

RAG Extension

Using RetrievalAugmentationAdvisor, developers can plug in knowledge‑base retrieval into the chat flow.

Code Example

{
  "stream": false,
  "model": "deepseek-chat",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi"}
  ],
  "tools": null,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "temperature": 1,
  "top_p": 1,
  "logprobs": false,
  "top_logprobs": null
}

{
  "id": "<string>",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "<string>",
        "reasoning_content": "<string>",
        "tool_calls": [
          {
            "id": "<string>",
            "type": "function",
            "function": {"name": "<string>", "arguments": "<string>"}
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 123, "completion_tokens": 123, "total_tokens": 123},
  "created": 123,
  "model": "<string>",
  "object": "chat.completion"
}

Intelligent Agent Example

By defining external functions (e.g., via HTTP, gRPC, or MCP) and registering them as tools, the model can plan calls, receive results, and continue the conversation.

Interface Skeleton

A POST endpoint returning text/event-stream streams the model’s responses.

Tool Definition

Local methods are wrapped as ToolCallback objects so the model can invoke them.

System Prompt

System prompts provide contextual constraints to guide the model’s behavior.

Invocation Flow

Each request includes the conversation history because the model is stateless; history can be collected on the front‑end or stored by the back‑end.

Conclusion

From an engineering perspective, large models can be treated like any other service (e.g., a database); Java developers familiar with Spring can leverage Spring‑AI to integrate, extend with RAG, and build sophisticated AI‑driven applications.