How to Backup WeChat Public Articles to PDF with Python
This guide explains why backing up WeChat public account articles is essential and provides a Python solution that captures article content, fixes image URLs, and generates PDF files—either one per article or a combined document—complete with code examples and usage tips.
Background
Content creators on WeChat worry about losing their articles, especially after investing significant effort and earning some income. Backing up these articles is crucial to prevent data loss.
Solution Overview
The author built a Python tool that packages WeChat public articles into PDF files, preserving images and a table of contents. The tool resolves the issue where images originally referenced by src attributes do not appear in PDFs by replacing those references so the images render correctly.
Key Code Snippets
def create_article_content(self, url, text):
"""文章内容"""
str = '<span style="font-size:30px; padding:10px"><a href = "{}">点击查看公众号原文</a></span>'.format(url) + text.replace('src', 'src')
return strThis function receives the article URL and HTML content, adds a hyperlink to the original article, and replaces the src placeholder so images are displayed in the PDF.
def creat_pdf_file(self, title, html_content):
html = 'tmp.html' # temporary HTML file
with open(html, 'w', encoding='utf-8') as f:
f.write(html_content)
try:
output_file = 'D:/gzh2/{}.pdf'.format(title)
if not os.path.exists(output_file):
pdfkit.from_file(html, output_file, configuration=self.config)
except Exception as e:
print(sys._getframe().f_code.co_name)
print(e)
finally:
os.remove(html)This creates a single‑article PDF by writing the HTML to a temporary file and converting it with pdfkit. It avoids duplicate files and cleans up the temporary file.
def creat_pdf_file(self):
htmls = []
for index, file in enumerate(self.html_contents):
html = '{}.html'.format(index)
with open(html, 'w', encoding='utf-8') as f:
f.write(file)
htmls.append(html)
try:
output_file = 'D:/gzh2/{}_的原创文章_第【{}-{}】篇.pdf'.format(self.gzh_name, (self.index_part - 1) * self.part_offset + 1, self.index_part * self.part_offset)
if not os.path.exists(output_file):
pdfkit.from_file(htmls, output_file, configuration=self.config)
except Exception as e:
print(sys._getframe().f_code.co_name)
print(e)
finally:
self.html_contents = []
for file in htmls:
os.remove(file)This version merges all articles into a single PDF for faster generation, though it may be less convenient for reading.
Important Notes
The tool relies on data captured with Charles (or similar) to obtain the article URLs and cookies. Since these values change per account, you must replace the example url and cookie with your own captured values.
Illustrations
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
