Fundamentals 5 min read

Extract Text from PowerPoint to Word with Python in Six Simple Lines

This tutorial shows how to use python-pptx and python-docx to extract every text element from a PowerPoint presentation and write it into a Word document, explaining the underlying file structures, required modules, and providing a concise six‑line script.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Extract Text from PowerPoint to Word with Python in Six Simple Lines

Hello, this article continues the Python office automation series, showing how to extract all text from a PowerPoint file and write it into a Word document using the python-pptx and python-docx libraries.

Requirement

Given a PPT that contains an introduction to Python, the goal is to pull every text element and insert it into a Word file.

Key Knowledge

The implementation only needs two modules: python-pptx and python-docx. The core script is only six lines, but you must understand the internal structures of PPT (presentation‑slide‑shape) and Word (document‑paragraph‑run).

Installation name and import name differ.

New‑style package names end with “‑pptx” or “‑docx” while the import uses pptx and docx.

Python Implementation

Import the modules and open the files:

from pptx import Presentation
from docx import Document

wordfile = Document()
filepath = r'xxxxxxxx'
pptx = Presentation(filepath)

Iterate over slides, shapes, and text frames, writing each paragraph to the Word document:

for slide in pptx.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            text_frame = shape.text_frame
            for paragraph in text_frame.paragraphs:
                wordfile.add_paragraph(paragraph.text)

Finally, save the Word file:

save_path = r'xxxxxxxx'
wordfile.save(save_path)

Conclusion

This real‑world example demonstrates how Python can automate repetitive office tasks, freeing your hands for more valuable work. Master the underlying file formats before writing scripts, and remember that batch operations are the core of Python office automation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

docxtext extractionOfficePPTX
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.