Build a Full‑Stack Zhihu Hot List: From Scraper to Mini‑App API
This tutorial walks through creating an end‑to‑end Zhihu hot‑list project that scrapes the billboard page, stores the data with Flask‑SQLAlchemy, schedules periodic updates, exposes RESTful APIs, and finally displays the results in a uni‑app mini‑program with interactive charts.
Data Crawling
We start by fetching the Zhihu hot‑list from
https://www.zhihu.com/billboard
, which returns 50 items via a JavaScript payload. By parsing this script we extract the hot list data.
url = 'https://www.zhihu.com/billboard'
headers = {"User-Agent": "", "Cookie": ""}
def get_hot_zhihu():
res = requests.get(url, headers=headers)
content = BeautifulSoup(res.text, "html.parser")
hot_data = content.find('script', id='js-initialData').string
hot_json = json.loads(hot_data)
hot_list = hot_json['initialState']['topstory']['hotList']
return hot_listClicking a hot item reveals a request that returns a JSON file with answer details, which we also parse.
def get_answer_zhihu(id):
url = f'https://www.zhihu.com/api/v4/questions/{id}/answers?include='
headers = {"User-Agent": "", "Cookie": ""}
res = requests.get(url + Config.ZHIHU_QUERY, headers=headers)
data_json = res.json()
answer_info = []
for i in data_json['data']:
if 'paid_info' in i:
continue
answer_info.append({
'author': i['author']['name'],
'voteup_count': i['voteup_count'],
'comment_count': i['comment_count'],
'content': i['content'],
'reward_info': i['reward_info']['reward_member_count']
})
return answer_infoData Storage
Collected data is persisted using Flask‑SQLAlchemy. Three tables are defined: ZhihuDetails for hot‑list items, ZhihuMetrics for hotness metrics, and ZhihuContent for answer details.
class ZhihuDetails(db.Model):
__tablename__ = 'ZhihuDetails'
id = db.Column(db.Integer, primary_key=True)
hot_id = db.Column(db.String(32), unique=True, index=True)
hot_name = db.Column(db.Text)
hot_link = db.Column(db.String(64))
hot_cardid = db.Column(db.String(32))
class ZhihuMetrics(db.Model):
__tablename__ = 'ZhihuMetrics'
id = db.Column(db.Integer, primary_key=True)
hot_metrics = db.Column(db.String(64))
hot_cardid = db.Column(db.String(32), index=True)
update_time = db.Column(db.DateTime)
class ZhihuContent(db.Model):
__tablename__ = 'ZhihuContent'
id = db.Column(db.Integer, primary_key=True)
answer_id = db.Column(db.Integer, index=True)
author = db.Column(db.String(32), index=True)
voteup_count = db.Column(db.Integer)
comment_count = db.Column(db.Integer)
reward_info = db.Column(db.Integer)
content = db.Column(db.Text)Scheduled Tasks
To keep the hot‑list up‑to‑date we use flask_apscheduler. The scheduler runs in the Flask app context and periodically calls the crawling functions, inserting or updating records in the database.
def opera_db():
with scheduler.app.app_context():
# task implementation
pass
from flask_apscheduler import APScheduler
scheduler = APScheduler()
def create_app(config_name):
app = Flask(__name__)
app.config.from_object(config[config_name])
config[config_name].init_app(app)
db.init_app(app)
scheduler.init_app(app)
return appAPI Development
Two REST endpoints are provided. /api/zhihu/hot/ returns the latest hot‑list with title, link, metrics, and timestamps. /api/zhihu/detail/<id>/ returns the metric trend for a specific hot item, suitable for charting.
@api.route('/api/zhihu/hot/')
def zhihu_api_data():
zhihu_data = zhihudata()
data_list = []
for data in zhihu_data:
data_dict = {
'title': data[0],
'link': data[1],
'metrics': data[2],
'hot_id': data[3],
'update_time': data[4]
}
data_list.append(data_dict)
return jsonify({'code': 0, 'content': data_list}), 200
@api.route('/api/zhihu/detail/<id>/')
def zhihu_api_detail(id):
zhihu_detail = zhihudetail(id)
return jsonify({'code': 0, 'data': zhihu_detail}), 200Mini‑Program Integration
The front‑end is built with uni‑app , allowing the same codebase to run on multiple platforms. We create a project in HBuilder, modify index.nvue to define two tabs (Zhihu and Weibo hot‑lists), and adjust news-page.nvue to request our Flask API.
data() {
return {
tabList: [
{id: "tab01", name: '知乎热榜', newsid: 0},
{id: "tab02", name: '微博热榜', newsid: 23}
]
}
}Network requests point to http://127.0.0.1:5000/api/zhihu/hot/ and http://127.0.0.1:5000/api/zhihu/detail/. The detail page uses the uCharts plugin to render a column chart for hotness distribution and a line chart for trend.
uni.request({
url: 'http://127.0.0.1:5000/api/zhihu/detail/' + this.details.hot_id,
success: function(res) {
// process data and render charts
}
});Chart configuration is set up with uCharts to display the data interactively.
showColumn(canvasId, chartData) {
canvaColumn = new uCharts({
$this: _self,
canvasId: canvasId,
type: 'column',
legend: {show: true},
categories: chartData.categories,
series: chartData.series,
// other options
});
}With these steps the complete pipeline—from web scraping to a functional mini‑program UI—is operational.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
