Multithreaded Python Crawl of Xiaomi App Store Games

This tutorial demonstrates how to use Python's requests, threading, and queue modules to build a multithreaded crawler that extracts game names, download links, and execution time from the Xiaomi App Store, complete with code examples and performance tips.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Multithreaded Python Crawl of Xiaomi App Store Games

Project Background

Xiaomi App Store offers a wide range of Android apps and games, but manually searching for each game is time‑consuming and the site can be slow.

Project Goal

Automatically retrieve game information—specifically the category "Chat & Social", app name, and download link—and display them in the console for user download.

Libraries and Tools

Requests

Threading

Queue

JSON

Time

PyCharm (IDE)

Project Analysis

The site loads data dynamically, so we capture network packets using Chrome DevTools to find the JSON API endpoint.

Example API URL:

http://app.mi.com/categotyAllListApi?page={}&categoryId=2&pageSize=30

Key query parameters are page, categoryId, and pageSize. By iterating the page value we can fetch multiple pages of JSON data.

Implementation

1. Define the Spider Class

import requests
from threading import Thread
from queue import Queue
import json
import time

class XiaomiSpider(object):
    def __init__(self):
        self.headers = {'User-Agent': 'Mozilla/5.0'}
        self.url = 'http://app.mi.com/categotyAllListApi?page={}&categoryId=15&pageSize=30'

    def main(self):
        pass

if __name__ == '__main__':
    spider = XiaomiSpider()
    spider.main()

2. URL Queue

self.url_queue = Queue()

3. Enqueue URLs

def url_in(self):
    # Generate URLs for pages 0‑66 and put them into the queue
    for i in range(67):
        self.url = self.url.format(i)
        self.url_queue.put(self.url)

4. Thread Worker to Fetch Pages

def get_page(self):
    while True:
        if not self.url_queue.empty():
            url = self.url_queue.get()
            html = requests.get(url, headers=self.headers).text
            self.parse_page(html)
        else:
            break

5. Parse JSON and Extract Data

def parse_page(self, html):
    app_json = json.loads(html)
    for app in app_json['data']:
        name = app['displayName']
        link = 'http://app.mi.com/details?id={}'.format(app['packageName'])
        print({'名称': name, '链接': link})

6. Launch Multiple Threads

def main(self):
    self.url_in()
    t_list = []
    for i in range(10):
        t = Thread(target=self.get_page)
        t.start()
        t_list.append(t)
    for t in t_list:
        t.join()

7. Measure Execution Time

start = time.time()
spider = XiaomiSpider()
spider.main()
end = time.time()
print('执行时间:%.2f' % (end - start))

Result Display

Running the script prints each game's name, download URL, and the total execution time in the console. Sample screenshots show the output and the clickable download links.

Conclusion

Do not overload the server with excessive requests; a moderate crawl is sufficient.

Python multithreading can significantly speed up I/O‑bound tasks like web crawling.

While single‑threaded programs can be pre‑empted, multithreading offers more flexibility and can release resources such as memory during idle periods.

Feel free to adapt this approach to other categories; hands‑on practice deepens understanding.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonmultithreadingthreadingWeb ScrapingrequestsXiaomi App Store
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.