Build a Simple Voice Synthesis System with Python and Baidu AI

This article walks you through creating a Python‑based voice synthesis tool using Baidu's AI platform, covering account setup, SDK installation, essential parameters, GUI design with Tkinter, code implementation, error handling, and generating audio files from custom text.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Build a Simple Voice Synthesis System with Python and Baidu AI

Introduction

We describe how to build a lightweight voice synthesis system in Python, leveraging Baidu's speech synthesis API to record and store personalized audio.

Software Dependencies

Development is done with Sublime Text 3; you need a Baidu Open Platform account to obtain API credentials.

Specific Implementation

1. Create an application on Baidu Open Platform to get the three required parameters: APP_ID, API_KEY and SECRET_KEY.

2. Install the Baidu Python SDK.

3. Review the SDK usage documentation.

4. Follow the screenshots to create the application and obtain the credentials.

5‑7. After acquiring the keys you can start coding.

Downloading and Configuring Baidu Speech Client

pip install baidu-aip

Configure the client with your credentials:

from aip import AipSpeech
""" Your APP_ID, API_KEY, SECRET_KEY """
APP_ID = 'your_app_id'
API_KEY = 'your_api_key'
SECRET_KEY = 'your_secret_key'
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

Parameter Overview

The synthesis request requires the following parameters:

text (String, required): the text to synthesize, UTF‑8, < 1024 bytes.

cuid (String, optional): a unique user identifier, up to 60 characters.

spd (String, optional): speed, 0‑9, default 5.

pit (String, optional): pitch, 0‑9, default 5.

vol (String, optional): volume, 0‑15, default 5.

per (String, optional): voice type (0 female, 1 male, 3‑4 emotional), default female.

Error Handling

If the request fails, Baidu returns a JSON with error_code and error_msg . Common error codes include:

500 – unsupported input.

501 – invalid parameters.

502 – token verification failed.

503 – synthesis backend error.

GUI Implementation

The application uses Tkinter to provide a simple interface where users select a voice style, input text, and generate an MP3 file.

class play:
    def __init__(self):
        self.root = tk.Tk()
        self.root.title("语音合成系统")
        self.root.geometry("700x700")
        self.lb = tk.Label(self.root, text='请选择语音类型')
        self.tt = tk.Text(self.root, width=80, height=30)
        self.cb = ttk.Combobox(self.root, width=12)
        self.cb['values'] = ('请选择-----','甜美型','萝莉型','大叔型','精神小伙型')
        self.cb.current(0)
        self.cb.bind('<<ComboboxSelected>>', self.go)
        self.lb1 = tk.Label(self.root, text='请输入文件名:')
        self.e = tk.Entry(self.root, width=30)
        self.b1 = tk.Button(self.root, text='生成音频文件', command=self.sc)
        # layout omitted for brevity
        self.root.mainloop()

The go method selects the appropriate synthesis parameters based on the chosen voice style and calls the Baidu API.

def go(self, *arg):
    self.client = AipSpeech(self.APP_ID, self.API_KEY, self.SECRET_KEY)
    if self.cb.get() == '甜美型':
        self.res = self.client.synthesis(self.tt.get('0.0','end'), 'zh', 1,
            {'vol':3,'spd':3,'pit':4,'per':0})
    # other styles omitted

The sc method checks the text length (must be < 1024 bytes), writes the returned binary audio data to an .mp3 file, and shows success or error dialogs.

def sc(self):
    self.go()
    txt = self.tt.get('0.0','end')
    if len(txt) >= 1024:
        messagebox.showerror('出错了!', '^_^最多不超过1024个字节^_^')
        return
    filename = os.path.join(os.path.dirname(sys.argv[0]), self.e.get()+'.mp3')
    if not os.path.exists(filename):
        with open(filename, 'wb') as f:
            f.write(self.res)
        messagebox.showinfo('完毕!', '生成完毕,文件在程序目录下')
    else:
        messagebox.showerror('出错了!', '文件名已存在')

Running play() launches the full application.

Conclusion

The tutorial demonstrates a practical use of Python and Baidu's speech synthesis API to create a functional voice synthesis tool with a graphical interface, covering setup, parameter configuration, error handling, and file generation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GUIPythonAPIvoice synthesisTkinterBaidu AI
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.