Backend Development 6 min read

How to Backup WeChat Public Articles to PDF with Python

This guide explains why backing up WeChat public account articles is essential and provides a Python solution that captures article content, fixes image URLs, and generates PDF files—either one per article or a combined document—complete with code examples and usage tips.

MaGe Linux Operations

Jun 30, 2019

How to Backup WeChat Public Articles to PDF with Python

Background

Content creators on WeChat worry about losing their articles, especially after investing significant effort and earning some income. Backing up these articles is crucial to prevent data loss.

Solution Overview

The author built a Python tool that packages WeChat public articles into PDF files, preserving images and a table of contents. The tool resolves the issue where images originally referenced by src attributes do not appear in PDFs by replacing those references so the images render correctly.

Key Code Snippets

def create_article_content(self, url, text):
    """文章内容"""
    str = '<span style="font-size:30px; padding:10px"><a href = "{}">点击查看公众号原文</a></span>'.format(url) + text.replace('src', 'src')
    return str

This function receives the article URL and HTML content, adds a hyperlink to the original article, and replaces the src placeholder so images are displayed in the PDF.

def creat_pdf_file(self, title, html_content):
    html = 'tmp.html'  # temporary HTML file
    with open(html, 'w', encoding='utf-8') as f:
        f.write(html_content)
    try:
        output_file = 'D:/gzh2/{}.pdf'.format(title)
        if not os.path.exists(output_file):
            pdfkit.from_file(html, output_file, configuration=self.config)
    except Exception as e:
        print(sys._getframe().f_code.co_name)
        print(e)
    finally:
        os.remove(html)

This creates a single‑article PDF by writing the HTML to a temporary file and converting it with pdfkit. It avoids duplicate files and cleans up the temporary file.

def creat_pdf_file(self):
    htmls = []
    for index, file in enumerate(self.html_contents):
        html = '{}.html'.format(index)
        with open(html, 'w', encoding='utf-8') as f:
            f.write(file)
        htmls.append(html)
    try:
        output_file = 'D:/gzh2/{}_的原创文章_第【{}-{}】篇.pdf'.format(self.gzh_name, (self.index_part - 1) * self.part_offset + 1, self.index_part * self.part_offset)
        if not os.path.exists(output_file):
            pdfkit.from_file(htmls, output_file, configuration=self.config)
    except Exception as e:
        print(sys._getframe().f_code.co_name)
        print(e)
    finally:
        self.html_contents = []
        for file in htmls:
            os.remove(file)

This version merges all articles into a single PDF for faster generation, though it may be less convenient for reading.

Important Notes

The tool relies on data captured with Charles (or similar) to obtain the article URLs and cookies. Since these values change per account, you must replace the example url and cookie with your own captured values.

Illustrations

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backup WeChat pdf-generation web-scraping pdfkit

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.