Artificial Intelligence 12 min read

Deploy GPT‑SoVITS for Text‑to‑Speech on Alibaba Cloud Function Compute – Step‑by‑Step Guide

This guide walks you through deploying the GPT‑SoVITS text‑to‑speech model on Alibaba Cloud Function Compute, covering application creation, quick voice synthesis, advanced model fine‑tuning, NAS file management, and optional promotional tasks for earning rewards.

Alibaba Cloud Developer

Nov 29, 2024

Deploy GPT‑SoVITS for Text‑to‑Speech on Alibaba Cloud Function Compute – Step‑by‑Step Guide

Introduction

If you need to generate speech from text and want a fast way to customize a personalized voice, we recommend deploying the GPT‑SoVITS model using Function Compute. GPT‑SoVITS is a popular large‑scale text‑to‑speech model that can produce highly similar synthetic voices with only a small amount of reference audio. Deploying it on Function Compute removes the need to manage GPU servers or environment configuration, while leveraging pay‑as‑you‑go, elastic scaling, and low‑cost operation.

Overview

The activity helps users quickly experience GPT‑SoVITS voice synthesis and offers a chance to win prizes by completing two tasks: deploying the GPT‑SoVITS application and uploading a screenshot of the synthesized voice.

Steps to Deploy GPT‑SoVITS

1. Access the Function Compute application template

Visit the Function Compute application template (only the East China 1 (Hangzhou) or East China 2 (Shanghai) regions are supported). Choose East China 1 (Hangzhou), keep other settings at default, and click Create Application . The model download may take about 15 minutes.

Grant the required role permissions if prompted.

2. Agree and continue deployment

In the dialog, read the creation reminder, check the billing items, and click I have read and agree then Agree and Continue .

3. Access the domain

Wait about one minute until the deployment status changes to Success . Click the domain link in the Environment Info section to start using the application. The first visit may take about 30 seconds before the FC version of GPT‑SoVITS appears.

Protect the domain to avoid unexpected charges.

The provided *.devsapp.net domain is for learning and testing only and will be reclaimed after 30 days; consider binding a custom domain for better experience.

If the custom domain is not bound and the application exceeds 30 days, redeploy the app and re‑mount the NAS.

Quick Experience: Synthesizing Voice

In the FC version of GPT‑SoVITS, select the Voice Clone & Inference tab, choose a template audio or upload your own reference audio, enter the text, and click Synthesize Voice to generate speech.

Template audios include “Little Elf” and “Sweet Girl”.

For custom voice characteristics, upload a 3‑10 second reference audio, provide its transcript, and select the language.

After synthesis, click the play button or the download icon to obtain the audio file.

Advanced: Fine‑Tuning GPT‑SoVITS

You can fine‑tune the large model with your own voice data. Intermediate outputs are stored in the NAS /output folder. The default training uses UVR5 for denoising and an ASR model for transcription; other models can be swapped by placing them in tools/asr/models and tools/uvr5/uvr5_weights.

NAS File Management

Open the NAS file system from the application’s Resource Info section, then use the NAS browser to view and manage files.

Data Pre‑processing

Select the Data Pre‑processing tab, specify the folder containing audio files in NAS (or upload directly), choose the model and output format, then click Start Data Pre‑processing . The results are saved under /output in the NAS.

Optional Text Correction

If the ASR transcription is inaccurate, use the Training Voice Text Correction tab to edit the denoise_opt.list file and save changes.

Model Training

In the Model Fine‑tuning tab, enter a model name and start either SoVITS training or GPT training. Trained models are stored in the NAS GPT_weights and SoVITS_weights directories.

After training, you can use the Voice Clone & Inference tab with your custom model to synthesize speech.

Promotional Activity

The tutorial is part of an activity where completing the deployment and uploading a synthesis screenshot earns a storage box prize (limited to 50 per workday). Users can claim the prize by clicking the “Read Original” link and following the instructions.

Reference links: Function Compute template , Custom domain guide , NAS configuration , GPT‑SoVITS README , NAS console .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

function compute text-to-speech GPT-SoVITS

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.