Artificial Intelligence 9 min read

Getting Started with Spring Cloud Alibaba AI: Integrating Text, Image, and Audio Models in a Spring Boot Application

This tutorial introduces Spring AI and Spring Cloud Alibaba AI, explains their core features, shows how to set up a Maven project with the required dependencies, and provides step‑by‑step code examples for invoking text, image, and audio generation models using Spring Boot.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Getting Started with Spring Cloud Alibaba AI: Integrating Text, Image, and Audio Models in a Spring Boot Application

Spring AI draws inspiration from Python projects like LangChain and LlamaIndex, aiming to make generative AI applications available across many programming languages, not just Python. Its core capabilities include abstraction, simplified AI development, model and vector support, and automatic configuration.

Spring Cloud Alibaba AI extends Spring AI with native support for Chinese large models such as Alibaba's Tongyi series, offering adapters for chat, text‑to‑image, and text‑to‑speech use cases, along with example projects.

Setup

Create a Maven project using JDK 17 and add the following dependencies:

com.alibaba.cloud
spring-cloud-alibaba-dependencies
2023.0.1.0
pom
import
com.alibaba.cloud
spring-cloud-starter-alibaba-ai

Configure the Tongyi API key in application.yml :

server:
  port: 8080
spring:
  application:
    name: alibaba-spring-ai-demo
  cloud:
    ai:
      tongyi:
        api-key: your-api-key

Create the main Spring Boot class:

@SpringBootApplication
public class MyAiApplication {
    public static void main(String[] args) {
        SpringApplication.run(MyAiApplication.class, args);
    }
}

Text Model Integration

Define a REST controller at /ai/simple that delegates to a TongYiService implementation:

@RestController
@RequestMapping("/ai")
@CrossOrigin
public class TongYiController {
    @Autowired
    @Qualifier("tongYiSimpleServiceImpl")
    private TongYiService tongYiSimpleService;

    @GetMapping("/simple")
    public String completion(@RequestParam(value = "message", defaultValue = "AI时代下Java开发者该何去何从?") String message) {
        return tongYiSimpleService.completion(message);
    }
}

The service interface declares methods for text completion, image generation, and audio synthesis:

public interface TongYiService {
    /** Basic Q&A */
    String completion(String message);
    /** Text‑to‑Image */
    ImageResponse genImg(String imgPrompt);
    /** Speech synthesis */
    String genAudio(String text);
}

The implementation uses Spring AI’s ChatClient and StreamingChatClient which are auto‑wired:

@Service
@Slf4j
public class TongYiSimpleServiceImpl extends AbstractTongYiServiceImpl {
    private final ChatClient chatClient;
    private final StreamingChatClient streamingChatClient;

    @Autowired
    public TongYiSimpleServiceImpl(ChatClient chatClient, StreamingChatClient streamingChatClient) {
        this.chatClient = chatClient;
        this.streamingChatClient = streamingChatClient;
    }

    @Override
    public String completion(String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return chatClient.call(prompt).getResult().getOutput().getContent();
    }
}

Sending the prompt “AI时代下Java开发者该何去何从?” returns a generated answer (approximately 10 seconds response time).

Image Generation Model

The image service creates an ImagePrompt and calls the ImageClient :

@Service
@Slf4j
public class TongYiImagesServiceImpl extends AbstractTongYiServiceImpl {
    private final ImageClient imageClient;

    @Autowired
    public TongYiImagesServiceImpl(ImageClient client) {
        this.imageClient = client;
    }

    @Override
    public ImageResponse genImg(String imgPrompt) {
        var prompt = new ImagePrompt(imgPrompt);
        return imageClient.call(prompt);
    }
}

Testing with the prompt “Painting a boy coding in front of the desk, with his dog.” produces a high‑quality image as shown in the original screenshots.

Audio Synthesis Model

The audio service uses SpeechClient to synthesize WAV audio from text:

@Service
@Slf4j
public class TongYiAudioSimpleServiceImpl extends AbstractTongYiServiceImpl {
    private final SpeechClient speechClient;

    @Autowired
    public TongYiAudioSimpleServiceImpl(SpeechClient client) {
        this.speechClient = client;
    }

    @Override
    public String genAudio(String text) {
        log.info("gen audio prompt is: {}", text);
        var resWAV = speechClient.call(text);
        // save the WAV file locally (code omitted)
        return save(resWAV, SpeechSynthesisAudioFormat.WAV.getValue());
    }
}

The generated audio file plays correctly, confirming successful integration.

Experience Summary

Simplified Development: Spring AI abstracts away low‑level SDK calls, making complex AI features easier to maintain.

Response Time: Basic text Q&A takes around 10 seconds; performance can vary with model size and workload.

Model Selection: Current Spring AI integration defaults to Tongyi models; selecting alternative providers requires additional configuration.

Future work for Spring Cloud Alibaba AI includes support for VectorStore, Embedding, and ETL pipelines to enable richer RAG applications.

JavaSpring AIimage generationspring-cloud-alibabatext generationAI integrationaudio synthesis
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.