Fix Chinese Character Garbling in Python Web Scraping: Simple Encoding Hacks
This article explains why Chinese characters become garbled during Python web scraping, demonstrates the problematic code, and provides clear encoding adjustments and alternative solutions to reliably extract readable text.
1. Introduction
Hello, I'm PiPi. Recently a question was raised in a Python community about Chinese garbled characters when using a web crawler.
Original code:
import requests
import parsel
url='https://news.p2peye.com/article-514723-1.html'
headers={
'Accept-Language': 'zh-CN,zh;q=0.9',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Cookie': '...',
'Host': 'news.p2peye.com',
'Referer': 'https://news.p2peye.com/article-514723-1.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
res = requests.get(url=url, headers=headers)
print(res.status_code)
res.encoding = res.apparent_encoding
# print(res.text)
selector_1 = parsel.Selector(res.text)
title = selector_1.css('#plat-title').get()
print(title)The output shows garbled characters:
2. Implementation Process
Manually setting the response encoding to the correct charset resolves the issue: res.encoding = 'gbk' After this change the title is displayed correctly:
Using res.encoding = res.apparent_encoding may fail because automatic detection is less reliable than explicitly specifying the charset.
An alternative approach is to re‑encode the extracted title: title.encode('iso-8859-1').decode('gbk') This also yields the correct result:
3. Conclusion
The article identifies the cause of Chinese character garbling in Python web scraping and provides concrete code solutions—setting the correct encoding manually or re‑encoding the extracted text—to ensure readable output.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
