Frontend Development 18 min read

How I Built a Personal Zhihu Collection Viewer with Flask and Vue.js

This article explains how to scrape all of your Zhihu saved collections using a Python crawler, store the data in JSON files, and then serve it through a Flask backend API and a Vue.js single‑page front‑end, complete with screenshots and full source code.

Python Programming Learning Circle

Dec 25, 2019

How I Built a Personal Zhihu Collection Viewer with Flask and Vue.js

Bored with browsing Zhihu without finding valuable content, I decided to revisit my saved collections, which had accumulated over years, to refresh my memory.

Effect

Using a Python crawler to fetch all of my Zhihu collections, a Flask backend API, and a Vue.js front‑end, I created a simple single‑page application. The result is shown in the screenshots below.

Crawler

Initially I looked for open‑source Zhihu crawlers on GitHub, but most high‑rated ones were no longer maintained after Zhihu’s redesign. Therefore I wrote a simple crawler myself using Python 3.

The crawler logs in with a username and password (no captcha needed on personal devices), keeps the session with requests.Session, iterates through pages using the ?page= parameter, extracts collection URLs, and then fetches each collection’s question‑answer list. Because the data volume is small, it is saved directly to JSON files and fetched with a single‑threaded requests call.

Two JSON files are generated: 知乎收藏文章.json containing all collections and their questions, and url_answer.json containing the answer text for each question.

import os
import json
from bs4 import BeautifulSoup
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
import requests_cache
requests_cache.install_cache('demo_cache')

Cookie_FilePlace = r'.'
Default_Header = {'User-Agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36",
                 'Host': "www.zhihu.com",
                 'Origin': "http://www.zhihu.com",
                 'Pragma': "no-cache",
                 'Referer': "http://www.zhihu.com/"}
Zhihu_URL = 'https://www.zhihu.com'
Login_URL = Zhihu_URL + '/login/email'
Profile_URL = 'https://www.zhihu.com/settings/profile'
Collection_URL = 'https://www.zhihu.com/collection/%d'
Cookie_Name = 'cookies.json'

os.chdir(Cookie_FilePlace)
r = requests.Session()
r.headers.update(Default_Header)
if os.path.isfile(Cookie_Name):
    with open(Cookie_Name, 'r') as f:
        cookies = json.load(f)
        r.cookies.update(cookies)

def login(r):
    print('====== zhihu login =====')
    email = input('email: ')
    password = input('password: ')
    print('====== logging.... =====')
    data = {'email': email, 'password': password, 'remember_me': 'true'}
    value = r.post(Login_URL, data=data).json()
    print('====== result:', value['r'], '-', value['msg'])
    if int(value['r']) == 0:
        with open(Cookie_Name, 'w') as f:
            json.dump(r.cookies.get_dict(), f)

def isLogin(r):
    url = Profile_URL
    value = r.get(url, allow_redirects=False, verify=False)
    status_code = int(value.status_code)
    if status_code == 301 or status_code == 302:
        print("未登录")
        return False
    elif status_code == 200:
        return True
    else:
        print(u"网络故障")
        return False

if not isLogin(r):
    login(r)

url_answer_dict = {}

def getCollectionsList():
    collections_list = []
    content = r.get(Profile_URL).content
    soup = BeautifulSoup(content, 'lxml')
    own_collections_url = 'http://' + soup.select('#js-url-preview')[0].text + '/collections'
    page_num = 0
    while True:
        page_num += 1
        url = own_collections_url + '?page=%d' % page_num
        content = r.get(url).content
        soup = BeautifulSoup(content, 'lxml')
        data = soup.select_one('#data').attrs['data-state']
        collections_dict_raw = json.loads(data)['entities']['favlists'].values()
        if not collections_dict_raw:
            break
        for i in collections_dict_raw:
            collections_list.append({'title': i['title'], 'url': Collection_URL % i['id']})
    print('====== prepare Collections Done =====')
    return collections_list


def getQaDictListFromOneCollection(collection_url='https://www.zhihu.com/collection/71534108'):
    qa_dict_list = []
    page_num = 0
    while True:
        page_num += 1
        url = collection_url + '?page=%d' % page_num
        content = r.get(url).content
        soup = BeautifulSoup(content, 'lxml')
        titles = soup.select('.zm-item-title a')
        if len(titles) == 0:
            break
        votes = soup.select('.js-vote-count')
        answer_urls = soup.select('.toggle-expand')
        answers = soup.select('textarea')
        authors = soup.select('.author-link-line .author-link')
        for title, vote, answer_url, answer, author in zip(titles, votes, answer_urls, answers, authors):
            author_img = getAthorImage(author['href'])
            qa_dict_list.append({'title': title.text,
                                 'question_url': title['href'],
                                 'answer_vote': vote.text,
                                 'answer_url': answer_url['href'],
                                 'author': author.text,
                                 'author_url': author['href'],
                                 'author_img': author_img})
            url_answer_dict[answer_url['href'][1:]] = answer.text
    return qa_dict_list

def getAthorImage(author_url):
    url = Zhihu_URL + author_url
    content = r.get(url).content
    soup = BeautifulSoup(content, 'lxml')
    return soup.select_one('.AuthorInfo-avatar')['src']

def getAllQaDictList():
    '''最终结果要是列表和字典的嵌套形式，以便前端解析'''
    all_qa_dict_list = []
    collections_list = getCollectionsList()
    for collection in collections_list:
        all_qa_dict_list.append({'ctitle': collection['title'],
                                 'clist': getQaDictListFromOneCollection(collection['url'])})
        print('====== getQa from %s Done =====' % collection['title'])
    return all_qa_dict_list

with open(u'知乎收藏文章.json', 'w', encoding='utf-8') as f:
    json.dump(getAllQaDictList(), f)
with open(u'url_answer.json', 'w', encoding='utf-8') as f:
    json.dump(url_answer_dict, f)

Frontend

The front‑end is a single‑page app built with Vue.js, Bootstrap, and the iView UI library. It displays the collection overview, uses a carousel for images, and lists questions with vote counts and author avatars. Data is fetched asynchronously via JSONP.

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>知乎个人收藏</title>
    <link rel="stylesheet" href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css">
    <link rel="stylesheet" href="http://v3.bootcss.com/examples/jumbotron-narrow/jumbotron-narrow.css">
    <link rel="stylesheet" href="http://unpkg.com/iview/dist/styles/iview.css">
</head>
<body>
    <div id="app">
        <div class="container">
            <div class="header clearfix">
                <h3 class="text-muted">知乎个人收藏</h3>
            </div>
            <div class="jumbotron">
                <h1>栏目总览</h1>
                <p class="lead">{{ description }}</p>
                <my-carousel></my-carousel>
            </div>
            <div class="row marketing">
                <div class="col-lg-6">
                    <my-card :collection="collection" v-for="collection in left"></my-card>
                </div>
                <div class="col-lg-6">
                    <my-card :collection="collection" v-for="collection in right"></my-card>
                </div>
            </div>
            <i-button @click="showLeave">That's all!</i-button>
            <Modal :visible.sync="visible" :title="modalTitle">{{ modalMessage }}</Modal>
        </div>
    </div>
    <script src="http://v1.vuejs.org/js/vue.min.js"></script>
    <script src="https://cdn.jsdelivr.net/vue.resource/1.2.0/vue-resource.min.js"></script>
    <script src="http://unpkg.com/iview/dist/iview.min.js"></script>
    <script>
        // Vue components and app initialization (omitted for brevity)
    </script>
</body>
</html>

Backend

The backend provides JSONP APIs using Flask and the Flask‑Jsonpify extension. It serves the static HTML file at the root, returns the collection list at /collections, and looks up answer text at /find/<path:answer_url>.

# -*- coding: utf-8 -*-
from flask import Flask
import json
from flask_jsonpify import jsonpify

app = Flask(__name__)

with open(u'知乎收藏文章.json', 'r', encoding='utf-8') as f:
    collections = json.load(f)
with open('url_answer.json', 'r', encoding='utf-8') as f:
    qa_dict = json.load(f)

with open('zhihuCollection.html', 'r', encoding='utf-8') as f:
    index_html = f.read()

@app.route('/')
def index():
    return index_html

@app.route('/collections')
def collectionsApi():
    return jsonpify(collections)

@app.route('/find/<path:answer_url>')
def answersApi(answer_url):
    return jsonpify({'answer': qa_dict[answer_url]})

@app.route('/test')
def test():
    return jsonpify(qa_dict)

if __name__ == '__main__':
    app.run(host='0.0.0.0')

Links to the libraries and frameworks used are listed at the end of the article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

frontend Python Vue.js API Flask

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.