Backend Development 14 min read

How to Crawl Zhihu’s Funniest Answers with Python: A Simple Two‑Step Guide

This article shows how to use Python to scrape Zhihu answers, store them in MongoDB, filter for short high‑upvote replies, and then presents a collection of programmer‑centric jokes that illustrate the kind of "god replies" the crawler can retrieve.

Efficient Ops
Efficient Ops
Efficient Ops
How to Crawl Zhihu’s Funniest Answers with Python: A Simple Two‑Step Guide

Scrape Zhihu Answers

First we fetch answers from selected topics using the function

get_answers_by_page(topic_id, page_no)

. The function builds the URL, sends a request with a browser User‑Agent, parses JSON, and stores the results in MongoDB.

<code>def get_answers_by_page(topic_id, page_no):
    offset = page_no * 10
    url = <topic_url>
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
    }
    r = requests.get(url, verify=False, headers=headers)
    content = r.content.decode("utf-8")
    data = json.loads(content)
    is_end = data["paging"]["is_end"]
    items = data["data"]
    client = pymongo.MongoClient()
    db = client["zhihu"]
    if len(items) > 0:
        db.answers.insert_many(items)
        db.saved_topics.insert({"topic_id": topic_id, "page_no": page_no})
    return is_end
</code>

The important fields are question.title (question title), content (answer body), and voteup_count (number of up‑votes). These fields are later used for filtering.

Filter Answers

After crawling we filter with a MongoDB aggregation pipeline to keep answers that have more than 1000 up‑votes and fewer than 50 characters, which are typical “short and witty” Zhihu “god replies”. The full script is available on GitHub.

https://github.com/pythonml/answer

Running the script on programmer‑related topics yields dozens of short, funny replies, which are displayed as examples.

Collection of Humorous Q&A

The article then presents a long list of humorous question‑answer pairs that reflect typical programmer jokes and cultural references, such as:

Q: "What is recursion?" A: "The definition of recursion is a political content that should not be publicly discussed."

Q: "Why do programmers carry a laptop bag even when it’s empty?" A: "Because they have no other bag."

Q: "Why does iPhone’s icon shake when deleting an app?" A: "Third‑party apps are scared, system apps are proud."

Q: "Why do programmers often say they are "bugs"?" A: "Because they think the operating system won’t let users modify core files."

… (many more Q&A covering topics from coding myths to social interactions) …

These jokes illustrate the style of “god replies” that the crawler aims to collect.

PythonMongoDBWeb ScrapingZhihuprogrammer jokes
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.