How to Add Automatic Numbered Bookmarks to PDFs with Python and PyPDF2
This guide shows how to enable automatic numbering of PDF table‑of‑contents entries exported from Typora by adding a custom CSS file and using a Python script with PyPDF2 to read, modify, and rewrite the PDF bookmarks, producing a numbered outline.
Typora exported PDF table of contents titles can be automatically numbered by adding a custom CSS file to the theme folder.
For example:
However, the exported PDF still lacks numbering in the outline.
The following Python script processes the PDF to add numbered bookmarks.
# 博客地址:https://blog.csdn.net/as604049322
__author__ = '小小明-代码实体'
__date__ = '2023/8/31'
from PyPDF2 import PdfReader, PdfWriter
def get_pdf_Bookmark(filename):
"作者CSDN:https://blog.csdn.net/as604049322"
if isinstance(filename, str):
pdf_reader = PdfReader(filename)
else:
pdf_reader = filename
pagecount = len(pdf_reader.pages)
# 用保存每个标题id所对应的页码
idnum2pagenum = {}
for i in range(pagecount):
page = pdf_reader.pages[i]
idnum2pagenum[page.indirect_ref.idnum] = i
# 保存每个标题对应的标签数据,包括层级,标题和页码索引(页码-1)
bookmark = []
def get_pdf_Bookmark_inter(outlines, tab=0):
for outline in outlines:
if isinstance(outline, list):
get_pdf_Bookmark_inter(outline, tab + 1)
else:
bookmark.append((tab, outline['/Title'], idnum2pagenum[outline.page.idnum]))
get_pdf_Bookmark_inter(pdf_reader.outline)
return bookmark
def pdf_write_bookmark(bookmark, pdf_file, compress=True):
pdf_reader = PdfReader(pdf_file)
num_pages = len(pdf_reader.pages)
pdf_writer = PdfWriter()
for page in pdf_reader.pages:
if compress:
page.compress_content_streams()
pdf_writer.add_page(page)
last_cache = [None] * (max(bookmark, key=lambda x: x[0])[0] + 1)
for tab, title, pagenum in bookmark:
if pagenum >= num_pages:
continue
parent = last_cache[tab - 1] if tab > 0 else None
indirect_id = pdf_writer.add_outline_item(title, pagenum, parent=parent)
last_cache[tab] = indirect_id
pdf_writer.page_mode = "/UseOutlines"
with open(pdf_file, "wb") as out:
pdf_writer.write(out)
print("已成功将书签写入到", pdf_file)
if __name__ == '__main__':
file = r"C:\Users\sj\Desktop\集团管理层培训.pdf"
bookmark = get_pdf_Bookmark(file)
num_cache = [0] * 7
last_tab = 0
new_bookmark = []
for tab, title, pagenum in bookmark:
if tab > last_tab:
num_cache[tab] = 1
else:
num_cache[tab] += 1
new_title = title
if not title[0].isdigit():
new_title = ".".join(map(str, num_cache[:tab + 1])) + " " + title
new_bookmark.append((tab, new_title, pagenum))
last_tab = tab
pdf_write_bookmark(new_bookmark, file)After processing, the PDF outline includes numbers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
