Backend Development 11 min read

Build a Full‑Stack Zhihu Hot List: From Scraper to Mini‑App API

This tutorial walks through creating an end‑to‑end Zhihu hot‑list project that scrapes the billboard page, stores the data with Flask‑SQLAlchemy, schedules periodic updates, exposes RESTful APIs, and finally displays the results in a uni‑app mini‑program with interactive charts.

MaGe Linux Operations

Dec 24, 2019

Build a Full‑Stack Zhihu Hot List: From Scraper to Mini‑App API

Data Crawling

We start by fetching the Zhihu hot‑list from

https://www.zhihu.com/billboard

, which returns 50 items via a JavaScript payload. By parsing this script we extract the hot list data.

url = 'https://www.zhihu.com/billboard'
headers = {"User-Agent": "", "Cookie": ""}

def get_hot_zhihu():
    res = requests.get(url, headers=headers)
    content = BeautifulSoup(res.text, "html.parser")
    hot_data = content.find('script', id='js-initialData').string
    hot_json = json.loads(hot_data)
    hot_list = hot_json['initialState']['topstory']['hotList']
    return hot_list

Clicking a hot item reveals a request that returns a JSON file with answer details, which we also parse.

def get_answer_zhihu(id):
    url = f'https://www.zhihu.com/api/v4/questions/{id}/answers?include='
    headers = {"User-Agent": "", "Cookie": ""}
    res = requests.get(url + Config.ZHIHU_QUERY, headers=headers)
    data_json = res.json()
    answer_info = []
    for i in data_json['data']:
        if 'paid_info' in i:
            continue
        answer_info.append({
            'author': i['author']['name'],
            'voteup_count': i['voteup_count'],
            'comment_count': i['comment_count'],
            'content': i['content'],
            'reward_info': i['reward_info']['reward_member_count']
        })
    return answer_info

Data Storage

Collected data is persisted using Flask‑SQLAlchemy. Three tables are defined: ZhihuDetails for hot‑list items, ZhihuMetrics for hotness metrics, and ZhihuContent for answer details.

class ZhihuDetails(db.Model):
    __tablename__ = 'ZhihuDetails'
    id = db.Column(db.Integer, primary_key=True)
    hot_id = db.Column(db.String(32), unique=True, index=True)
    hot_name = db.Column(db.Text)
    hot_link = db.Column(db.String(64))
    hot_cardid = db.Column(db.String(32))

class ZhihuMetrics(db.Model):
    __tablename__ = 'ZhihuMetrics'
    id = db.Column(db.Integer, primary_key=True)
    hot_metrics = db.Column(db.String(64))
    hot_cardid = db.Column(db.String(32), index=True)
    update_time = db.Column(db.DateTime)

class ZhihuContent(db.Model):
    __tablename__ = 'ZhihuContent'
    id = db.Column(db.Integer, primary_key=True)
    answer_id = db.Column(db.Integer, index=True)
    author = db.Column(db.String(32), index=True)
    voteup_count = db.Column(db.Integer)
    comment_count = db.Column(db.Integer)
    reward_info = db.Column(db.Integer)
    content = db.Column(db.Text)

Scheduled Tasks

To keep the hot‑list up‑to‑date we use flask_apscheduler. The scheduler runs in the Flask app context and periodically calls the crawling functions, inserting or updating records in the database.

def opera_db():
    with scheduler.app.app_context():
        # task implementation
        pass

from flask_apscheduler import APScheduler
scheduler = APScheduler()

def create_app(config_name):
    app = Flask(__name__)
    app.config.from_object(config[config_name])
    config[config_name].init_app(app)
    db.init_app(app)
    scheduler.init_app(app)
    return app

API Development

Two REST endpoints are provided. /api/zhihu/hot/ returns the latest hot‑list with title, link, metrics, and timestamps. /api/zhihu/detail/<id>/ returns the metric trend for a specific hot item, suitable for charting.

@api.route('/api/zhihu/hot/')
def zhihu_api_data():
    zhihu_data = zhihudata()
    data_list = []
    for data in zhihu_data:
        data_dict = {
            'title': data[0],
            'link': data[1],
            'metrics': data[2],
            'hot_id': data[3],
            'update_time': data[4]
        }
        data_list.append(data_dict)
    return jsonify({'code': 0, 'content': data_list}), 200

@api.route('/api/zhihu/detail/<id>/')
def zhihu_api_detail(id):
    zhihu_detail = zhihudetail(id)
    return jsonify({'code': 0, 'data': zhihu_detail}), 200

Mini‑Program Integration

The front‑end is built with uni‑app , allowing the same codebase to run on multiple platforms. We create a project in HBuilder, modify index.nvue to define two tabs (Zhihu and Weibo hot‑lists), and adjust news-page.nvue to request our Flask API.

data() {
    return {
        tabList: [
            {id: "tab01", name: '知乎热榜', newsid: 0},
            {id: "tab02", name: '微博热榜', newsid: 23}
        ]
    }
}

Network requests point to http://127.0.0.1:5000/api/zhihu/hot/ and http://127.0.0.1:5000/api/zhihu/detail/. The detail page uses the uCharts plugin to render a column chart for hotness distribution and a line chart for trend.

uni.request({
    url: 'http://127.0.0.1:5000/api/zhihu/detail/' + this.details.hot_id,
    success: function(res) {
        // process data and render charts
    }
});

Chart configuration is set up with uCharts to display the data interactively.

showColumn(canvasId, chartData) {
    canvaColumn = new uCharts({
        $this: _self,
        canvasId: canvasId,
        type: 'column',
        legend: {show: true},
        categories: chartData.categories,
        series: chartData.series,
        // other options
    });
}

With these steps the complete pipeline—from web scraping to a functional mini‑program UI—is operational.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database Scheduler API Flask uni-app

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.