How to Programmatically Retrieve Baidu Index Data with Python (No UI Automation)
Learn step‑by‑step how to fetch Baidu Index search volume data for any keyword using Python: discover the API endpoints, decrypt the encrypted response with a custom function, and wrap the process into reusable code without relying on UI automation tools.
In this tutorial, the author demonstrates how to collect Baidu Index search volume data directly via Python, avoiding heavyweight UI automation tools.
Baidu Index is a data analysis platform based on massive user behavior data, providing keyword search volume, trend, related news, and demographic information.
The approach consists of three main steps:
Use the index API to obtain a uniqid and the encrypted index data ( userIndexes).
Call the ptbk API with the uniqid to retrieve the decryption key ( ptbk).
Decrypt the encrypted data using a custom Python function that mirrors the original JavaScript logic.
The decryption function translates the JavaScript algorithm into Python:
def decrypt(ptbk, index_data):
n = len(ptbk)//2
a = dict(zip(ptbk[:n], ptbk[n:]))
return "".join([a[s] for s in index_data])With the key in hand, the script iterates over each keyword, decrypts the data for all platforms (overall, PC, and mobile), and prints the results:
for userIndexe in data['userIndexes']:
name = userIndexe['word'][0]['name']
index_data = userIndexe['all']['data']
r = decrypt(ptbk, index_data)
print(name, r)Example output for the keywords python and java:
python 23438,23510,23514,24137,22538,17964,15860
java 8925,8779,9040,9055,9110,6312,5333A complete, reusable function get_index_data is provided, which accepts a list of keywords and optional date range, handles the API calls, decryption, and returns a structured dictionary of daily index values for each platform.
def get_index_data(keys, start=None, end=None):
words = [[{"name": key, "wordType": 1}] for key in keys]
words = str(words).replace(" ", "").replace("'", "\"")
today = date.today()
if start is None:
start = str(today - timedelta(days=8))
if end is None:
end = str(today - timedelta(days=2))
url = f'http://index.baidu.com/api/SearchApi/index?area=0&word={words}&area=0&startDate={start}&endDate={end}'
res = requests.get(url, headers=headers)
data = res.json()['data']
uniqid = data['uniqid']
ptbk_res = requests.get(f'http://index.baidu.com/Interface/ptbk?uniqid={uniqid}', headers=headers)
ptbk = ptbk_res.json()['data']
result = {"startDate": start, "endDate": end}
for userIndexe in data['userIndexes']:
name = userIndexe['word'][0]['name']
tmp = {}
for platform in ['all', 'pc', 'wise']:
enc = userIndexe[platform]['data']
dec = [int(e) for e in decrypt(ptbk, enc).split(',')]
tmp[platform] = dec
result[name] = tmp
return resultRunning get_index_data(["python", "java"]) yields a dictionary containing daily index values for the specified period.
Key images illustrating the process:
The full source code and additional details are available on the original blog post.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
