Backend Development 11 min read

How to Seamlessly Integrate MiniMax & CosyVoice TTS into Spring Boot with UnifiedTTS

This guide walks you through building a Spring Boot application, registering a UnifiedTTS API key, configuring MiniMax or CosyVoice models, implementing the service layer, running unit tests, and handling production concerns to achieve high‑quality text‑to‑speech synthesis without changing client code.

Programmer DD

Oct 24, 2025

How to Seamlessly Integrate MiniMax & CosyVoice TTS into Spring Boot with UnifiedTTS

In scenarios that require high‑quality text‑to‑speech (TTS) such as audiobooks or podcasts, the previously introduced EdgeTTS solution may not deliver satisfactory results. MiniMax and CosyVoice provide more natural, human‑like voices. By using the UnifiedTTS unified interface, you can switch between these engines without modifying client code. This article guides you from zero to integrating MiniMax and CosyVoice synthesis capabilities into a Spring Boot application, and includes a complete runnable example.

Practical Steps

1. Build a Spring Boot Application

Create a Spring Boot project via start.spring.io or any other method, and add necessary dependencies such as spring-boot-starter-web and lombok:

<dependencies>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>

  <dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
  </dependency>
</dependencies>

2. Register UnifiedTTS and Obtain an API Key

Visit the UnifiedTTS console, create an API key, and record it for later configuration.

3. Integrate UnifiedTTS API (MiniMax / CosyVoice)

3.1 Configuration File (application.properties)

unified-tts.host=https://unifiedtts.com
unified-tts.api-key=${UNIFIEDTTS_API_KEY}

Replace unified-tts.api-key with the key you created.

3.2 Configuration Class and DTOs

// src/main/java/com/example/tts/config/UnifiedTtsProperties.java
@Data
@ConfigurationProperties(prefix = "unified-tts")
public class UnifiedTtsProperties {
    private String host;
    private String apiKey;
}

// src/main/java/com/example/tts/dto/UnifiedTtsRequest.java
@Data
@AllArgsConstructor
@NoArgsConstructor
public class UnifiedTtsRequest {
    private String model;   // e.g., minimax-tts or cosyvoice-tts
    private String voice;   // e.g., zh_female_1
    private String text;
    private Double speed;   // optional
    private Double pitch;   // optional
    private Double volume;  // optional
    private String format;  // mp3/wav/ogg
}

// src/main/java/com/example/tts/dto/UnifiedTtsResponse.java
@Data
@AllArgsConstructor
@NoArgsConstructor
public class UnifiedTtsResponse {
    private boolean success;
    private String message;
    private long timestamp;
    private UnifiedTtsResponseData data;

    @Data
    @AllArgsConstructor
    @NoArgsConstructor
    public static class UnifiedTtsResponseData {
        @JsonProperty("request_id")
        private String requestId;
        @JsonProperty("audio_url")
        private String audioUrl;
        @JsonProperty("file_size")
        private long fileSize;
    }
}

3.3 Service Implementation (RestClient Synchronous Synthesis)

// src/main/java/com/example/tts/service/UnifiedTtsService.java
package com.example.tts.service;

import com.example.tts.dto.UnifiedTtsRequest;
import com.example.tts.config.UnifiedTtsProperties;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestClient;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;

@Service
public class UnifiedTtsService {
    private final RestClient restClient;
    private final UnifiedTtsProperties properties;

    public UnifiedTtsService(UnifiedTtsProperties properties) {
        this.properties = properties;
        this.restClient = RestClient.builder()
                .baseUrl(properties.getHost())
                .build();
    }

    public byte[] synthesize(UnifiedTtsRequest request) {
        ResponseEntity<byte[]> response = restClient
                .post()
                .uri("/api/v1/common/tts-sync")
                .contentType(MediaType.APPLICATION_JSON)
                .accept(MediaType.APPLICATION_OCTET_STREAM, MediaType.valueOf("audio/mpeg"), MediaType.valueOf("audio/mp3"))
                .header("X-API-Key", properties.getApiKey())
                .body(request)
                .retrieve()
                .toEntity(byte[].class);
        if (response.getStatusCode().is2xxSuccessful() && response.getBody() != null) {
            return response.getBody();
        }
        throw new IllegalStateException("UnifiedTTS synthesize failed: " + response.getStatusCode());
    }

    public Path synthesizeToFile(UnifiedTtsRequest request, Path outputPath) {
        byte[] data = synthesize(request);
        try {
            if (outputPath.getParent() != null) {
                Files.createDirectories(outputPath.getParent());
            }
            Files.write(outputPath, data);
            return outputPath;
        } catch (IOException e) {
            throw new RuntimeException("Failed to write TTS output to file: " + outputPath, e);
        }
    }
}

3.4 Unit Test (MiniMax / CosyVoice)

// src/test/java/com/example/tts/UnifiedTtsServiceTest.java
@SpringBootTest
class UnifiedTtsServiceTest {
    @Autowired
    private UnifiedTtsService unifiedTtsService;

    @Test
    void testSynthesizeToFileWithMiniMax() throws Exception {
        UnifiedTtsRequest req = new UnifiedTtsRequest(
                "speech-02-turbo",
                "Chinese (Mandarin)_Gentle_Youth",
                "你好，欢迎使用 UnifiedTTS 的 MiniMax 模型配音。",
                1.0,
                0.0,
                1.0,
                "mp3"
        );
        Path projectDir = Paths.get(System.getProperty("user.dir"));
        Path resultDir = projectDir.resolve("test-result");
        Files.createDirectories(resultDir);
        Path out = resultDir.resolve(System.currentTimeMillis() + ".mp3");
        Path written = unifiedTtsService.synthesizeToFile(req, out);
        assertTrue(Files.exists(written), "Output file should exist");
        assertTrue(Files.size(written) > 0, "Output file size should be > 0");
    }
}

4. Run and Verify

After executing the unit test, the generated audio file can be found under the test-result directory:

5. Common Parameters and Voice Choices

model

: e.g., speech-02-turbo (refer to official docs for supported models). voice: e.g., Chinese (Mandarin)_Gentle_Youth. rate: speech speed (recommended 0.8–1.2). pitch: pitch adjustment (recommended -3 to +3). volume: volume level (recommended 0.8–1.2). format: mp3 (default), wav (lossless), or ogg.

MiniMax and CosyVoice models are recommended for the best quality.

6. Error Handling and Retry Recommendations

Timeout and network errors: configure timeout-ms and log reasons in onErrorResume.

4xx/5xx responses: distinguish authentication failures, rate‑limiting, and server errors, and report accordingly.

Retry strategy: use exponential back‑off with jitter for transient errors.

Concurrency and throttling: implement a queue or token‑bucket for high‑throughput scenarios.

Caching: cache results keyed by text+voice+params to reduce cost and latency.

7. Production Recommendations

Security: inject API keys via environment variables or secret‑management services.

Monitoring: record synthesis latency, failure reasons, and retry ratios.

Storage: persist audio to local disk or object storage (e.g., S3) with lifecycle policies.

Standardization: unify DTOs and response structures to simplify adding new models.

Extensibility: enable configuration‑driven switching among Azure, Edge, Elevenlabs, MiniMax, CosyVoice, etc.

Conclusion

By using UnifiedTTS, switching between MiniMax, CosyVoice, or even Elevenlabs only requires changing the model and voice fields in the request. The unified interface reduces maintenance overhead across multiple TTS engines, allowing you to balance cost, voice style, and audio quality. Further enhancements such as robust error handling, caching, and concurrency control can turn this into a production‑grade TTS service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Spring Boot TTS MiniMax CosyVoice UnifiedTTS

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.