Build a Voice-Enabled Chatbot in Python Using Baidu AI and Qingyunke
This tutorial walks through creating a Python program that captures spoken input, converts it to text with Baidu AI, sends the text to the free Qingyunke chatbot API for a response, and finally synthesizes the reply back into speech, complete with code snippets and setup instructions.
Brief Description
In the past two days I needed to build a small Python program that enables a human to have an intelligent voice conversation with a robot, i.e., a speech dialogue interface. The goal is to achieve real-time voice interaction between a user and a chatbot.
Overall Idea
The process can be broken down into four main steps:
Capture the user's voice input .
Convert the captured voice into text .
Send the text to an intelligent dialogue API and receive a smart response in text form.
Convert the response text back into voice output .
Many existing libraries can help implement these steps.
Required Environment
Install the following Python dependencies:
pip install pyaudio – for recording and generating WAV files
pip install baidu-aip – Baidu AI SDK for speech-to-text
pip install pyttsx3 – for converting text to speech
Below is the implementation of each function, which will later be combined.
Capture User Voice and Save as Audio File
import time
import wave
from pyaudio import PyAudio, paInt16
framerate = 16000 # sample rate
num_samples = 2000 # frames per read
channels = 1 # mono
sampwidth = 2 # 2 bytes per sample
FILEPATH = '../voices/myvoices.wav' # ensure directory exists
class Speak():
# Save audio data to a WAV file
def save_wave_file(self, filepath, data):
wf = wave.open(filepath, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(sampwidth)
wf.setframerate(framerate)
wf.writeframes(b''.join(data))
wf.close()
# Record audio for a fixed duration (5 seconds)
def my_record(self):
pa = PyAudio()
stream = pa.open(format=paInt16, channels=channels, rate=framerate, input=True, frames_per_buffer=num_samples)
my_buf = []
t = time.time()
print('正在讲话...')
while time.time() < t + 5:
string_audio_data = stream.read(num_samples)
my_buf.append(string_audio_data)
print('讲话结束')
self.save_wave_file(FILEPATH, my_buf)
stream.close()Call Baidu AI for Speech Recognition
First, create an application on the Baidu AI Open Platform, enable the Speech Recognition service, and obtain the AppID , API Key , and Secret Key . Then use the SDK to send the recorded WAV file and get the recognized text.
from aip import AipSpeech
APP_ID = '25990397'
API_KEY = 'iS91n0uEOujkMIlsOTLxiVOc'
SECRET_KEY = '' # fill in your secret key
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
class ReadWav():
# Read file content
def get_file_content(self, filePath):
with open(filePath, 'rb') as fp:
return fp.read()
# Recognize speech from the local WAV file
def predict(self):
return client.asr(self.get_file_content('../voices/myvoices.wav'), 'wav', 16000, {'dev_pid': 1537})
readWav = ReadWav()
print(readWav.predict())Sample result:
{'corpus_no': '7087884083428433929', 'err_msg': 'success.', 'err_no': 0, 'result': ['你叫什么名字呀?'], 'sn': '255158586831650276613'}Send Text to Qingyunke Chatbot
Qingyunke provides a free, no‑registration API that returns a chatbot reply for a given message.
def talkWithRobot(msg):
url = 'http://api.qingyunke.com/api.php?key=free&appid=0&msg={}'.format(urllib.parse.quote(msg))
html = requests.get(url)
return html.json()["content"]
print(talkWithRobot("你好呀!"))Example output: 哟~ 都好都好
Convert Response to Speech
import pyttsx3
class RobotSay():
def __init__(self):
self.engine = pyttsx3.init()
self.rate = self.engine.getProperty('rate')
self.engine.setProperty('rate', self.rate - 50)
def say(self, msg):
self.engine.say(msg)
self.engine.runAndWait()
robotSay = RobotSay()
robotSay.say("你好呀")Combine into an Automatic Voice Chatbot
def talkWithRobot(msg):
url = 'http://api.qingyunke.com/api.php?key=free&appid=0&msg={}'.format(urllib.parse.quote(msg))
html = requests.get(url)
return html.json()["content"]
robotSay = RobotSay()
speak = Speak()
readTalk = ReadWav()
while True:
speak.my_record() # Record voice
text = readTalk.predict()['result'][0] # Speech‑to‑text via Baidu
print("本人说:", text)
response_dialogue = talkWithRobot(text) # Chatbot reply
print("青云客说:", response_dialogue)
robotSay.say(response_dialogue) # Text‑to‑speechRunning the program yields a back‑and‑forth conversation where the user speaks, the system recognizes the speech, queries the chatbot, and plays the reply aloud.
Future Work
The current implementation is a simple command‑line prototype. Future improvements include adding a graphical user interface, handling longer sessions, and extending functionality with more AI services.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
