Fix Chinese Encoding Issues in Python Web Scraping: Practical Tips & Code
This article walks through a common Chinese character garbling problem in Python web crawlers, explains why automatic encoding detection can fail, and provides clear code examples—including manual GBK setting and re‑encoding tricks—to reliably extract readable text.
Introduction
The author encountered garbled Chinese characters when using a Python web crawler and posted the original request code and a screenshot of the malformed output.
Implementation
The initial request code is shown below:
import requests
import parsel
url = 'https://news.p2peye.com/article-514723-1.html'
headers = {
'Accept-Language': 'zh-CN,zh;q=0.9',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Cookie': '... (omitted for brevity) ...',
'Host': 'news.p2peye.com',
'Referer': 'https://news.p2peye.com/article-514723-1.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
res = requests.get(url=url, headers=headers)
print(res.status_code)
res.encoding = res.apparent_encoding
selector_1 = parsel.Selector(res.text)
title = selector_1.css('#plat-title').get()
print(title)The response shows garbled text, as illustrated in the following screenshot:
To fix the issue, the author manually set the response encoding to gbk: res.encoding = 'gbk' After this change, the page renders correctly, as shown in the next screenshot:
The author also explains why using res.encoding = res.apparent_encoding can be unreliable, because automatic detection may choose the wrong charset.
An alternative fix is to re‑encode the extracted title string: title.encode('iso-8859-1').decode('gbk') This method also yields the correct result, demonstrated by another screenshot:
Conclusion
The article summarizes the Chinese encoding problem in Python web crawling, provides concrete code solutions—setting res.encoding = 'gbk' or re‑encoding the extracted text—and demonstrates that these approaches restore readable Chinese characters.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
