Fundamentals 4 min read

Extract Specific Sections from a Text File Using Python Regex – Step‑by‑Step Guide

This article walks through a real‑world Python text‑processing task, showing how to use regular expressions to extract desired passages from a large document, presenting initial code, identified issues, an improved solution, and the resulting output.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Extract Specific Sections from a Text File Using Python Regex – Step‑by‑Step Guide

Introduction

Hello, I am PiPi. A fellow member of a Python community asked how to process a text file to extract specific sections using Python.

The original requirement is illustrated in the following screenshots:

Implementation Process

First, a straightforward script using a regular expression was written:

import re
with open('西游记.txt', 'r', encoding='utf-8') as f:
    text = f.read()
regex = re.compile(r'.*?《》(.*?)《》.*?', re.S)
result = re.findall(regex, text)
print(len(list(result)))
for item in result:
    print(item)

The script produced the output shown below:

A reviewer noticed a problem and suggested improvements. The revised code is:

import re

with open('西游记.txt', 'r+', encoding='utf-8') as f:
    txt = f.read()

rex1 = r'《》目录 (.*?)


'
rex2 = r'《》目录 (第一百回.*?《西游记》至此终。)'
result = re.findall(rex1, txt, re.S)
temp = re.findall(rex2, txt, re.S)
result += temp
# print(len(result))
for item in result:
    print(item)

Running the improved script yields the following result:

Summary

The problem was solved by applying regular expressions to locate and extract the required text fragments from the source file. The article demonstrates the initial approach, identifies its shortcomings, and presents a refined solution that successfully meets the user's needs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

file I/OScriptingregex
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.