Build a Voice-Enabled Chatbot in Python Using Baidu AI and Qingyunke

This tutorial walks through creating a Python program that captures spoken input, converts it to text with Baidu AI, sends the text to the free Qingyunke chatbot API for a response, and finally synthesizes the reply back into speech, complete with code snippets and setup instructions.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Build a Voice-Enabled Chatbot in Python Using Baidu AI and Qingyunke

Brief Description

In the past two days I needed to build a small Python program that enables a human to have an intelligent voice conversation with a robot, i.e., a speech dialogue interface. The goal is to achieve real-time voice interaction between a user and a chatbot.

Overall Idea

The process can be broken down into four main steps:

Capture the user's voice input .

Convert the captured voice into text .

Send the text to an intelligent dialogue API and receive a smart response in text form.

Convert the response text back into voice output .

Many existing libraries can help implement these steps.

Required Environment

Install the following Python dependencies:

pip install pyaudio – for recording and generating WAV files

pip install baidu-aip – Baidu AI SDK for speech-to-text

pip install pyttsx3 – for converting text to speech

Below is the implementation of each function, which will later be combined.

Capture User Voice and Save as Audio File

import time
import wave
from pyaudio import PyAudio, paInt16

framerate = 16000  # sample rate
num_samples = 2000  # frames per read
channels = 1        # mono
sampwidth = 2       # 2 bytes per sample
FILEPATH = '../voices/myvoices.wav'  # ensure directory exists

class Speak():
    # Save audio data to a WAV file
    def save_wave_file(self, filepath, data):
        wf = wave.open(filepath, 'wb')
        wf.setnchannels(channels)
        wf.setsampwidth(sampwidth)
        wf.setframerate(framerate)
        wf.writeframes(b''.join(data))
        wf.close()

    # Record audio for a fixed duration (5 seconds)
    def my_record(self):
        pa = PyAudio()
        stream = pa.open(format=paInt16, channels=channels, rate=framerate, input=True, frames_per_buffer=num_samples)
        my_buf = []
        t = time.time()
        print('正在讲话...')
        while time.time() < t + 5:
            string_audio_data = stream.read(num_samples)
            my_buf.append(string_audio_data)
        print('讲话结束')
        self.save_wave_file(FILEPATH, my_buf)
        stream.close()

Call Baidu AI for Speech Recognition

First, create an application on the Baidu AI Open Platform, enable the Speech Recognition service, and obtain the AppID , API Key , and Secret Key . Then use the SDK to send the recorded WAV file and get the recognized text.

Baidu AI console
Baidu AI console
from aip import AipSpeech

APP_ID = '25990397'
API_KEY = 'iS91n0uEOujkMIlsOTLxiVOc'
SECRET_KEY = ''  # fill in your secret key

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

class ReadWav():
    # Read file content
    def get_file_content(self, filePath):
        with open(filePath, 'rb') as fp:
            return fp.read()

    # Recognize speech from the local WAV file
    def predict(self):
        return client.asr(self.get_file_content('../voices/myvoices.wav'), 'wav', 16000, {'dev_pid': 1537})

readWav = ReadWav()
print(readWav.predict())

Sample result:

{'corpus_no': '7087884083428433929', 'err_msg': 'success.', 'err_no': 0, 'result': ['你叫什么名字呀?'], 'sn': '255158586831650276613'}

Send Text to Qingyunke Chatbot

Qingyunke provides a free, no‑registration API that returns a chatbot reply for a given message.

def talkWithRobot(msg):
    url = 'http://api.qingyunke.com/api.php?key=free&appid=0&msg={}'.format(urllib.parse.quote(msg))
    html = requests.get(url)
    return html.json()["content"]

print(talkWithRobot("你好呀!"))

Example output: 哟~ 都好都好

Convert Response to Speech

import pyttsx3

class RobotSay():
    def __init__(self):
        self.engine = pyttsx3.init()
        self.rate = self.engine.getProperty('rate')
        self.engine.setProperty('rate', self.rate - 50)

    def say(self, msg):
        self.engine.say(msg)
        self.engine.runAndWait()

robotSay = RobotSay()
robotSay.say("你好呀")

Combine into an Automatic Voice Chatbot

def talkWithRobot(msg):
    url = 'http://api.qingyunke.com/api.php?key=free&appid=0&msg={}'.format(urllib.parse.quote(msg))
    html = requests.get(url)
    return html.json()["content"]

robotSay = RobotSay()
speak = Speak()
readTalk = ReadWav()
while True:
    speak.my_record()                              # Record voice
    text = readTalk.predict()['result'][0]         # Speech‑to‑text via Baidu
    print("本人说:", text)
    response_dialogue = talkWithRobot(text)       # Chatbot reply
    print("青云客说:", response_dialogue)
    robotSay.say(response_dialogue)               # Text‑to‑speech

Running the program yields a back‑and‑forth conversation where the user speaks, the system recognizes the speech, queries the chatbot, and plays the reply aloud.

Future Work

The current implementation is a simple command‑line prototype. Future improvements include adding a graphical user interface, handling longer sessions, and extending functionality with more AI services.

ChatbotSpeech Recognitiontext-to-speechBaidu AIqingyunke
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.