Deploy GPT‑SoVITS for Real‑Time Voice Synthesis on Alibaba Cloud Function Compute

This guide walks you through deploying the GPT‑SoVITS text‑to‑speech model on Alibaba Cloud Function Compute, covering architecture, step‑by‑step deployment, quick voice synthesis, and advanced fine‑tuning using NAS storage, VPC networking, and optional custom domains.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Deploy GPT‑SoVITS for Real‑Time Voice Synthesis on Alibaba Cloud Function Compute

Solution Overview

The GPT‑SoVITS model provides high‑quality, natural‑sounding voice generation for customer service, game characters, and other AI voice interaction scenarios. By leveraging Alibaba Cloud Function Compute, developers can quickly create a scalable, on‑demand text‑to‑speech service.

Technical Architecture

Function Compute : Hosts the GPT‑SoVITS inference service. Users select a GPU model, upload a 3‑10 s reference audio clip, and submit a text prompt to generate speech.

NAS File Storage : Stores the pre‑trained GPT‑SoVITS model files and generated audio outputs.

VPC (Virtual Private Cloud) : Provides a private network so Function Compute can securely access the NAS.

Deploying the GPT‑SoVITS Application

Open the Function Compute application template (only East China 1 or East China 2 regions are supported) and click Create Application . Model download may take about 15 minutes.

If the default role lacks permissions, click Go to Authorization and grant the required rights.

Read the deployment warnings, check the billing items, confirm the agreement, and click Agree and Continue Deployment .

Wait roughly one minute until the deployment status changes to Deployment Successful . Then click the domain link in the Environment Information section to access the application.

First access may require about 30 seconds before the FC‑based GPT‑SoVITS UI loads.

Quick Experience: Voice Synthesis

In the FC UI, select the Voice Clone & Inference tab.

Choose a template audio (e.g., “little sprite” or “sweet girl”) or upload a personal 3‑10 s reference clip, fill in the corresponding text and language, then click Synthesize Voice .

After synthesis completes, click the playback button or the download icon to retrieve the generated audio.

Important: If synthesis fails, enable function logs, retry, and use the logs to diagnose the issue.

Advanced: Fine‑Tuning GPT‑SoVITS with Your Own Voice Data

Step 1 – Visual Management of NAS Files

Open the NAS console from the application detail page, navigate to File System > File System List , locate the NAS instance linked to Function Compute, and click Browser to open a file‑browser view.

Step 2 – Data Pre‑Processing

Select the Data Pre‑Processing tab in the FC UI.

Enter the path of the audio folder stored in NAS (or upload files directly), choose the desired model and output format, then click Start Data Pre‑Processing .

Processed files appear under <function‑name>/output/ in NAS, including denoised audio, sliced segments, ASR‑generated transcripts, and UVR5‑separated vocal/accompaniment files.

Denoised audio:

<NAS‑url>:/<function‑name>/output/denoise_opt

Audio slices: <NAS‑url>:/<function‑name>/output/slicer_opt ASR transcripts: <NAS‑url>:/<function‑name>/output/asr_opt UVR5 separation:

<NAS‑url>:/<function‑name>/output/uvr5_opt

Step 3 – Optional Text Correction

If ASR transcripts contain errors, use the Training Voice Text Correction tab to edit them.

Change Index / Refresh : Switch pages after correcting a batch.

Submit Text : Save corrected text for the current audio.

Merge Audio : Combine audio segments.

Delete Audio : Remove unwanted audio (irreversible).

Previous / Next Index : Navigate between pages.

Split Audio : Manually split an audio file.

Save File : Persist the corrected transcript.

Invert Selection : Reverse the current selection.

Step 4 – Model Training

Open the Model Fine‑Tuning tab, enter a model name, and click Start SoVITS Training or Start GPT Training .

Trained weights are saved in NAS under GPT_weights and SoVITS_weights directories.

After training, return to the Voice Clone & Inference tab and select your custom model for synthesis.

Important Notices

Protect the provided *.devsapp.net domain; sharing it may incur unexpected charges.

The sandbox domain is reclaimed after 30 days; bind a custom domain for production use.

If the application exceeds 30 days without a custom domain, it will become inaccessible and must be redeployed.

References

Function Compute application template: https://fcnext.console.aliyun.com/applications/ai/create?template=68&from=solution

Custom domain configuration guide: https://help.aliyun.com/zh/functioncompute/fc-3-0/user-guide/configure-custom-domain-names

NAS file system configuration: https://help.aliyun.com/zh/functioncompute/fc-3-0/user-guide/configure-a-nas-file-system-1

Official GPT‑SoVITS README (Chinese): https://github.com/RVC-Boss/GPT-SoVITS/blob/main/docs/cn/README.md

NAS console: https://nasnext.console.aliyun.com/overview

Quick‑start tutorial for GPT‑SoVITS voice synthesis: https://help.aliyun.com/document_detail/2805773.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

model fine-tuningfunction computeGPT-SoVITSAI voice synthesisNAS storage
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.