How to Build a Hearthstone Streamer Ranking Site with Python, Django, and LeanCloud
This article walks through creating a video‑aggregation platform for Hearthstone streamers by scraping data from sites like Douyu, processing it with Python regular expressions, storing results in Redis or LeanCloud, and exposing the rankings via a Django‑based web front‑end with periodic cloud‑function refreshes.
Introduction
As a Hearthstone player the author often watches top streamers on various live‑stream platforms. Because streamers are spread across different services, switching between them is cumbersome, which inspired the creation of a unified video‑aggregation site.
Basic Idea
The aggregator collects streamer information from target sites, extracts the needed fields, stores them, and displays the ranking on a web page. Since live‑stream data changes constantly, the collected data is cached in Redis and persisted using LeanCloud storage.
Features
Data collection and parsing
Data storage
Web presentation
Technology Stack
Language (Python)
Python is chosen for its lightweight nature and strong library support for web crawling and web development, and LeanCloud also supports Python deployment.
Data Collection (requests)
The requests library provides a simple, lightweight way to fetch pages; the project is small enough that Scrapy is unnecessary.
Web Framework (Django)
Django supplies a robust framework and template engine; its REST framework is used to expose API endpoints, and a future React Native mobile client is planned.
Storage (LeanCloud)
LeanCloud’s built‑in storage service is used directly for persisting the collected data.
Deployment (LeanCloud Engine)
The project follows LeanCloud’s official project skeleton for deployment.
Front‑end (PureCSS)
PureCSS is used for a simple, responsive UI with basic components.
Environment Setup
Set up a Python virtual environment (e.g., using virtualenv) and install dependencies listed in requirements.txt. See Liao Xuefeng’s blog for details.
Analysis and Collection
Video Site Parsing
The target is Douyu’s Hearthstone section; the goal is to collect each streamer’s URL, title, screenshot, popularity, and name.
Sample Page Structure
Regular‑Expression Extraction
All parsing code resides in fetch.py. Example snippets:
re.finditer('<a class="play-list-link" .*?>([\s\S]*?)</a>', response.content.decode('utf8'))
Explanation: the request returns UTF‑8 content; the regular expression matches the entire anchor tag that contains streamer information.
href = re.search('href=".*?"', group).group().lstrip('href="').rstrip('"')
Extracts the streamer’s room URL.
title = re.search('title=".*?"', group).group().lstrip('title="').rstrip('"')
Extracts the room title.
img = re.search('data-original=".*?"', group).group().lstrip('data-original="').rstrip('"')
Extracts the screenshot URL.
name = re.search('<span class="dy-name ellipsis fl">.*?</span>', group).group().lstrip('<span class="dy-name ellipsis fl">').rstrip('</span>')
Extracts the streamer’s name.
num = re.search('<span class="dy-num fr.*?</span>', group).group().lstrip('<span class="dy-num fr">').rstrip('</span>')
Extracts the popularity number; the Chinese “万” unit is removed and the value is converted to an integer:
int(round(float(num.replace('万','')).replace('\r','').replace('\n','') * 10000))
Storage and Refresh
Collected data is stored via LeanCloud’s API. The data model includes fields: id , name , title , href , num , and img .
API Design
/fetch
Triggers the whole workflow (clear, collect, parse, store). It is intended to be called by a scheduled cloud function.
/chairmans (Redis version only)
Provides paginated access to stored streamer records via Django‑REST‑framework.
/chairman/{id} (Redis version only)
Returns details of a single streamer.
Refresh Mechanism
A LeanCloud cloud function (defined in cloud.py) is scheduled like a cron job to periodically invoke /fetch and keep the ranking up‑to‑date.
Web Presentation
The front‑end displays a simple list of streamers with their ranking, title, and a link to the live room. The page is responsive for mobile devices.
List Page
Django’s template engine renders the list; because LeanCloud storage differs from Django ORM, the data is passed as a plain list of dictionaries.
Deployment
The application is deployed to LeanCloud using the lean CLI. Deployment steps include configuring the Git repository, adding a deploy key, setting the domain, and enabling a scheduled task.
Conclusion
The project is a lightweight, hands‑on example of building a video‑aggregation service with Python, Django, and LeanCloud. It demonstrates data crawling, regex parsing, cloud storage, API design, and responsive front‑end rendering. Questions and issues are welcome on the GitHub repository.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
