How to Seamlessly Integrate MiniMax & CosyVoice TTS into Spring Boot with UnifiedTTS
This guide walks you through building a Spring Boot application, registering a UnifiedTTS API key, configuring MiniMax or CosyVoice models, implementing the service layer, running unit tests, and handling production concerns to achieve high‑quality text‑to‑speech synthesis without changing client code.
In scenarios that require high‑quality text‑to‑speech (TTS) such as audiobooks or podcasts, the previously introduced EdgeTTS solution may not deliver satisfactory results. MiniMax and CosyVoice provide more natural, human‑like voices. By using the UnifiedTTS unified interface, you can switch between these engines without modifying client code. This article guides you from zero to integrating MiniMax and CosyVoice synthesis capabilities into a Spring Boot application, and includes a complete runnable example.
Practical Steps
1. Build a Spring Boot Application
Create a Spring Boot project via start.spring.io or any other method, and add necessary dependencies such as spring-boot-starter-web and lombok:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
</dependencies>2. Register UnifiedTTS and Obtain an API Key
Visit the UnifiedTTS console, create an API key, and record it for later configuration.
3. Integrate UnifiedTTS API (MiniMax / CosyVoice)
3.1 Configuration File (application.properties)
unified-tts.host=https://unifiedtts.com
unified-tts.api-key=${UNIFIEDTTS_API_KEY}Replace unified-tts.api-key with the key you created.
3.2 Configuration Class and DTOs
// src/main/java/com/example/tts/config/UnifiedTtsProperties.java
@Data
@ConfigurationProperties(prefix = "unified-tts")
public class UnifiedTtsProperties {
private String host;
private String apiKey;
}
// src/main/java/com/example/tts/dto/UnifiedTtsRequest.java
@Data
@AllArgsConstructor
@NoArgsConstructor
public class UnifiedTtsRequest {
private String model; // e.g., minimax-tts or cosyvoice-tts
private String voice; // e.g., zh_female_1
private String text;
private Double speed; // optional
private Double pitch; // optional
private Double volume; // optional
private String format; // mp3/wav/ogg
}
// src/main/java/com/example/tts/dto/UnifiedTtsResponse.java
@Data
@AllArgsConstructor
@NoArgsConstructor
public class UnifiedTtsResponse {
private boolean success;
private String message;
private long timestamp;
private UnifiedTtsResponseData data;
@Data
@AllArgsConstructor
@NoArgsConstructor
public static class UnifiedTtsResponseData {
@JsonProperty("request_id")
private String requestId;
@JsonProperty("audio_url")
private String audioUrl;
@JsonProperty("file_size")
private long fileSize;
}
}3.3 Service Implementation (RestClient Synchronous Synthesis)
// src/main/java/com/example/tts/service/UnifiedTtsService.java
package com.example.tts.service;
import com.example.tts.dto.UnifiedTtsRequest;
import com.example.tts.config.UnifiedTtsProperties;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestClient;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
@Service
public class UnifiedTtsService {
private final RestClient restClient;
private final UnifiedTtsProperties properties;
public UnifiedTtsService(UnifiedTtsProperties properties) {
this.properties = properties;
this.restClient = RestClient.builder()
.baseUrl(properties.getHost())
.build();
}
public byte[] synthesize(UnifiedTtsRequest request) {
ResponseEntity<byte[]> response = restClient
.post()
.uri("/api/v1/common/tts-sync")
.contentType(MediaType.APPLICATION_JSON)
.accept(MediaType.APPLICATION_OCTET_STREAM, MediaType.valueOf("audio/mpeg"), MediaType.valueOf("audio/mp3"))
.header("X-API-Key", properties.getApiKey())
.body(request)
.retrieve()
.toEntity(byte[].class);
if (response.getStatusCode().is2xxSuccessful() && response.getBody() != null) {
return response.getBody();
}
throw new IllegalStateException("UnifiedTTS synthesize failed: " + response.getStatusCode());
}
public Path synthesizeToFile(UnifiedTtsRequest request, Path outputPath) {
byte[] data = synthesize(request);
try {
if (outputPath.getParent() != null) {
Files.createDirectories(outputPath.getParent());
}
Files.write(outputPath, data);
return outputPath;
} catch (IOException e) {
throw new RuntimeException("Failed to write TTS output to file: " + outputPath, e);
}
}
}3.4 Unit Test (MiniMax / CosyVoice)
// src/test/java/com/example/tts/UnifiedTtsServiceTest.java
@SpringBootTest
class UnifiedTtsServiceTest {
@Autowired
private UnifiedTtsService unifiedTtsService;
@Test
void testSynthesizeToFileWithMiniMax() throws Exception {
UnifiedTtsRequest req = new UnifiedTtsRequest(
"speech-02-turbo",
"Chinese (Mandarin)_Gentle_Youth",
"你好,欢迎使用 UnifiedTTS 的 MiniMax 模型配音。",
1.0,
0.0,
1.0,
"mp3"
);
Path projectDir = Paths.get(System.getProperty("user.dir"));
Path resultDir = projectDir.resolve("test-result");
Files.createDirectories(resultDir);
Path out = resultDir.resolve(System.currentTimeMillis() + ".mp3");
Path written = unifiedTtsService.synthesizeToFile(req, out);
assertTrue(Files.exists(written), "Output file should exist");
assertTrue(Files.size(written) > 0, "Output file size should be > 0");
}
}4. Run and Verify
After executing the unit test, the generated audio file can be found under the test-result directory:
5. Common Parameters and Voice Choices
model: e.g., speech-02-turbo (refer to official docs for supported models). voice: e.g., Chinese (Mandarin)_Gentle_Youth. rate: speech speed (recommended 0.8–1.2). pitch: pitch adjustment (recommended -3 to +3). volume: volume level (recommended 0.8–1.2). format: mp3 (default), wav (lossless), or ogg.
MiniMax and CosyVoice models are recommended for the best quality.
6. Error Handling and Retry Recommendations
Timeout and network errors: configure timeout-ms and log reasons in onErrorResume.
4xx/5xx responses: distinguish authentication failures, rate‑limiting, and server errors, and report accordingly.
Retry strategy: use exponential back‑off with jitter for transient errors.
Concurrency and throttling: implement a queue or token‑bucket for high‑throughput scenarios.
Caching: cache results keyed by text+voice+params to reduce cost and latency.
7. Production Recommendations
Security: inject API keys via environment variables or secret‑management services.
Monitoring: record synthesis latency, failure reasons, and retry ratios.
Storage: persist audio to local disk or object storage (e.g., S3) with lifecycle policies.
Standardization: unify DTOs and response structures to simplify adding new models.
Extensibility: enable configuration‑driven switching among Azure, Edge, Elevenlabs, MiniMax, CosyVoice, etc.
Conclusion
By using UnifiedTTS, switching between MiniMax, CosyVoice, or even Elevenlabs only requires changing the model and voice fields in the request. The unified interface reduces maintenance overhead across multiple TTS engines, allowing you to balance cost, voice style, and audio quality. Further enhancements such as robust error handling, caching, and concurrency control can turn this into a production‑grade TTS service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
