How to Fix Garbled Chinese Filenames When Unzipping with Python's zipfile
This article explains why Chinese filenames become corrupted when extracted with Python's zipfile module and provides a complete, step‑by‑step solution that re‑encodes the names correctly using standard libraries.
Introduction
Hello everyone, I am a Python enthusiast. Recently a member of the Python automation group asked how to unzip a file whose Chinese filenames appear as garbled characters.
Problem
The original code uses zipfile.ZipFile to extract files, but the extracted filenames are corrupted because the zipfile module decodes names using the CP437 code page.
import zipfile
def unzip_file(zip_file_path, output_folder_path):
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
zip_ref.extractall(output_folder_path)
# usage example
zip_file_path = 'example.zip'
output_folder_path = 'output_folder'
unzip_file(zip_file_path, output_folder_path)Solution
The teacher examined the zipfile source and confirmed it uses CP437 encoding. By re‑encoding the filenames from CP437 to GBK (or the appropriate charset) the problem is solved. The revised function extracts the archive and then walks the output directory, renaming any non‑ASCII filenames.
import zipfile
import os
def unzip_file(zip_file_path, output_folder_path, encoding='gbk'):
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
zip_ref.extractall(output_folder_path)
# Walk the extracted directory and fix filenames
for root, dirs, files in os.walk(output_folder_path):
for bad_name in files:
if not bad_name.isascii():
true_name = bad_name.encode('cp437').decode(encoding)
os.rename(os.path.join(root, bad_name), os.path.join(root, true_name))
# usage example
zip_file_path = 'example.zip'
output_folder_path = 'output_folder'
unzip_file(zip_file_path, output_folder_path)The solution works without needing pathlib, using only the standard os module.
Discussion
Fans asked which library is best for unzipping. Answers mentioned the built‑in zipfile module, WinRAR command‑line calls, and that any graphical unzip tool can also be used; the code is convenient for batch processing multiple archives.
Conclusion
This article presented a specific Python automation problem, explained why Chinese filenames become garbled, and provided a complete, runnable solution that correctly handles the encoding.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
