Extract Specific Sections from a Text File Using Python Regex – Step‑by‑Step Guide
This article walks through a real‑world Python text‑processing task, showing how to use regular expressions to extract desired passages from a large document, presenting initial code, identified issues, an improved solution, and the resulting output.
Introduction
Hello, I am PiPi. A fellow member of a Python community asked how to process a text file to extract specific sections using Python.
The original requirement is illustrated in the following screenshots:
Implementation Process
First, a straightforward script using a regular expression was written:
import re
with open('西游记.txt', 'r', encoding='utf-8') as f:
text = f.read()
regex = re.compile(r'.*?《》(.*?)《》.*?', re.S)
result = re.findall(regex, text)
print(len(list(result)))
for item in result:
print(item)The script produced the output shown below:
A reviewer noticed a problem and suggested improvements. The revised code is:
import re
with open('西游记.txt', 'r+', encoding='utf-8') as f:
txt = f.read()
rex1 = r'《》目录 (.*?)
'
rex2 = r'《》目录 (第一百回.*?《西游记》至此终。)'
result = re.findall(rex1, txt, re.S)
temp = re.findall(rex2, txt, re.S)
result += temp
# print(len(result))
for item in result:
print(item)Running the improved script yields the following result:
Summary
The problem was solved by applying regular expressions to locate and extract the required text fragments from the source file. The article demonstrates the initial approach, identifies its shortcomings, and presents a refined solution that successfully meets the user's needs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
