Backend Development 4 min read

Fix Chinese Encoding Issues in Python Web Scraping: Practical Tips & Code

This article walks through a common Chinese character garbling problem in Python web crawlers, explains why automatic encoding detection can fail, and provides clear code examples—including manual GBK setting and re‑encoding tricks—to reliably extract readable text.

Python Crawling & Data Mining

Sep 5, 2025

Fix Chinese Encoding Issues in Python Web Scraping: Practical Tips & Code

Introduction

The author encountered garbled Chinese characters when using a Python web crawler and posted the original request code and a screenshot of the malformed output.

Implementation

The initial request code is shown below:

import requests
import parsel

url = 'https://news.p2peye.com/article-514723-1.html'
headers = {
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Encoding': 'gzip, deflate, br',
    'Cookie': '... (omitted for brevity) ...',
    'Host': 'news.p2peye.com',
    'Referer': 'https://news.p2peye.com/article-514723-1.html',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest'
}
res = requests.get(url=url, headers=headers)
print(res.status_code)
res.encoding = res.apparent_encoding
selector_1 = parsel.Selector(res.text)
title = selector_1.css('#plat-title').get()
print(title)

The response shows garbled text, as illustrated in the following screenshot:

To fix the issue, the author manually set the response encoding to gbk: res.encoding = 'gbk' After this change, the page renders correctly, as shown in the next screenshot:

The author also explains why using res.encoding = res.apparent_encoding can be unreliable, because automatic detection may choose the wrong charset.

An alternative fix is to re‑encode the extracted title string: title.encode('iso-8859-1').decode('gbk') This method also yields the correct result, demonstrated by another screenshot:

Conclusion

The article summarizes the Chinese encoding problem in Python web crawling, provides concrete code solutions—setting res.encoding = 'gbk' or re‑encoding the extracted text—and demonstrates that these approaches restore readable Chinese characters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Unicode web-scraping requests Parsel

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.