Fundamentals 8 min read

Automate Extraction of Meeting Details from Word Docs to Excel with Python in 30 Lines

This tutorial shows how to use Python's glob, python-docx, and openpyxl libraries to automatically scan a folder of Word meeting notices, pull out study time, content, format, and host information, and populate an Excel spreadsheet, saving hours of manual work.

Python Crawling & Data Mining

Jan 27, 2021

Automate Extraction of Meeting Details from Word Docs to Excel with Python in 30 Lines

A reader needed to process thousands of meeting‑notice Word documents and extract four key fields—study time, study content, study format, and host—into an Excel sheet. The article demonstrates a concise Python solution that automates this repetitive task.

Basic Logic

Use glob to collect all .docx files in the Notice folder.

Parse each Word file with python-docx to locate the four pieces of information.

Write the extracted data into an Excel workbook using openpyxl.

The workflow can be broken down into three steps: file discovery, data extraction, and Excel output.

Code Implementation

First, import the required libraries:

from docx import Document
from openpyxl import load_workbook
import glob

Load the Excel template:

path = r'C:\Users\xxx'  # adjust to your actual path
workbook = load_workbook(path + r'\Meeting_temp.xlsx')
sheet = workbook.active

Parse a single document to understand its structure (each paragraph corresponds to a line of text):

wordfile = Document(path + r'\Notice\会议通知 1.docx')
for paragraph in wordfile.paragraphs:
    print(paragraph)

Extract the four fields from each paragraph:

for paragraph in wordfile.paragraphs:
    if paragraph.text[0:5] == '学习时间：':
        study_time = paragraph.text[5:]
    if paragraph.text[0:4] == '主持人：':
        host = paragraph.text[4:]
    if paragraph.text[0:5] == '学习形式：':
        study_type = paragraph.text[5:]
    if len(paragraph.text) >= 2:
        if paragraph.text[0].isdigit() and paragraph.text[1] == '、':
            content_lst.append(paragraph.text)
content = ' '.join(content_lst)

Append the extracted data to the Excel sheet:

number = 0
number += 1
sheet.append([number, study_time, content, study_type, host])

Combine everything to process all files in the folder:

from docx import Document
from openpyxl import load_workbook
import glob

path = r'C:\Users\xxx'
workbook = load_workbook(path + r'\Meeting_temp.xlsx')
sheet = workbook.active
number = 0

for file in glob.glob(path + r'\Notice\*.docx'):
    wordfile = Document(file)
    content_lst = []
    for paragraph in wordfile.paragraphs:
        if paragraph.text[0:5] == '学习时间：':
            study_time = paragraph.text[5:]
        if paragraph.text[0:4] == '主持人：':
            host = paragraph.text[4:]
        if paragraph.text[0:5] == '学习形式：':
            study_type = paragraph.text[5:]
        if len(paragraph.text) >= 2:
            if paragraph.text[0].isdigit() and paragraph.text[1] == '、':
                content_lst.append(paragraph.text)
    content = ' '.join(content_lst)
    number += 1
    sheet.append([number, study_time, content, study_type, host])

workbook.save(path + r'\Meeting_notice.xlsx')

The solution processes each document in a few seconds and writes all extracted rows to Meeting_notice.xlsx with only about thirty lines of code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Automation glob Word openpyxl python-docx

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.