How to Programmatically Retrieve Baidu Index Data with Python (No UI Automation)

Learn step‑by‑step how to fetch Baidu Index search volume data for any keyword using Python: discover the API endpoints, decrypt the encrypted response with a custom function, and wrap the process into reusable code without relying on UI automation tools.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Programmatically Retrieve Baidu Index Data with Python (No UI Automation)

In this tutorial, the author demonstrates how to collect Baidu Index search volume data directly via Python, avoiding heavyweight UI automation tools.

Baidu Index is a data analysis platform based on massive user behavior data, providing keyword search volume, trend, related news, and demographic information.

The approach consists of three main steps:

Use the index API to obtain a uniqid and the encrypted index data ( userIndexes).

Call the ptbk API with the uniqid to retrieve the decryption key ( ptbk).

Decrypt the encrypted data using a custom Python function that mirrors the original JavaScript logic.

The decryption function translates the JavaScript algorithm into Python:

def decrypt(ptbk, index_data):
    n = len(ptbk)//2
    a = dict(zip(ptbk[:n], ptbk[n:]))
    return "".join([a[s] for s in index_data])

With the key in hand, the script iterates over each keyword, decrypts the data for all platforms (overall, PC, and mobile), and prints the results:

for userIndexe in data['userIndexes']:
    name = userIndexe['word'][0]['name']
    index_data = userIndexe['all']['data']
    r = decrypt(ptbk, index_data)
    print(name, r)

Example output for the keywords python and java:

python 23438,23510,23514,24137,22538,17964,15860
java 8925,8779,9040,9055,9110,6312,5333

A complete, reusable function get_index_data is provided, which accepts a list of keywords and optional date range, handles the API calls, decryption, and returns a structured dictionary of daily index values for each platform.

def get_index_data(keys, start=None, end=None):
    words = [[{"name": key, "wordType": 1}] for key in keys]
    words = str(words).replace(" ", "").replace("'", "\"")
    today = date.today()
    if start is None:
        start = str(today - timedelta(days=8))
    if end is None:
        end = str(today - timedelta(days=2))
    url = f'http://index.baidu.com/api/SearchApi/index?area=0&word={words}&area=0&startDate={start}&endDate={end}'
    res = requests.get(url, headers=headers)
    data = res.json()['data']
    uniqid = data['uniqid']
    ptbk_res = requests.get(f'http://index.baidu.com/Interface/ptbk?uniqid={uniqid}', headers=headers)
    ptbk = ptbk_res.json()['data']
    result = {"startDate": start, "endDate": end}
    for userIndexe in data['userIndexes']:
        name = userIndexe['word'][0]['name']
        tmp = {}
        for platform in ['all', 'pc', 'wise']:
            enc = userIndexe[platform]['data']
            dec = [int(e) for e in decrypt(ptbk, enc).split(',')]
            tmp[platform] = dec
        result[name] = tmp
    return result

Running get_index_data(["python", "java"]) yields a dictionary containing daily index values for the specified period.

Key images illustrating the process:

The full source code and additional details are available on the original blog post.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

APIencryptionWeb ScrapingBaidu Index
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.