Backend Development 5 min read

Scraping Zhihu "Beauty" Topic Images with Python and Baidu AI Face Detection

This article explains how to collect images from Zhihu's "beauty" topic using Python's Requests and lxml libraries, filter them with Baidu AI's AipFace face detection service, and store the qualified pictures locally, detailing the required environment, logic, and preparation steps.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Scraping Zhihu "Beauty" Topic Images with Python and Baidu AI Face Detection

1. Data Source – The target data are all images appearing in answers to questions under Zhihu's "beauty" topic.

2. Scraping Tool – Implemented with Python 3 and third‑party libraries requests , lxml , and Baidu's AipFace SDK; the script consists of about 100 lines of code.

3. Required Environment

Operating system: macOS, Linux (theoretically), or Windows (with filename character restrictions handled by regex).

No Zhihu login needed.

A Baidu Cloud account is required for the face‑detection service.

4. Face‑Detection Library – AipFace is Baidu AI's Python SDK for face detection, accessible via HTTP and free to use.

5. Filtering Conditions

Discard images without any detected face (e.g., landscapes, non‑portrait photos).

Keep only female faces; male images are mostly celebrities and are ignored.

Exclude non‑real persons such as anime characters (AipFace confidence < 0.6).

Remove low‑beauty scores (beauty < 45) to save storage.

6. Implementation Logic

Use requests to fetch a list of discussions under the "beauty" topic.

Parse each discussion's HTML with lxml to extract all img tag src URLs.

Download each image via requests (ignoring animated GIFs).

Send the image to AipFace for face detection.

Apply the filtering rules from step 5.

Save the remaining images to the local file system with filenames composed of beauty score, author, question title, and an index.

Repeat the process from step 1.

7. Scraping Results – Images are stored in a folder; the highest beauty score observed (aside from a celebrity) is 88. The author notes personal disagreement with the ranking order.

8. Preparation for Running

Install Python 3.

Install the required libraries with a single pip install requests lxml baidu-aip command.

Apply for a free Baidu Cloud face‑detection service (Baidu AI – Face Recognition).

The article also includes promotional material for a free Python public course and related learning resources.

data collectionPythonimage processingface detectionZhihubaidu-aiweb-scraping
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.