Master Python Regular Expressions: From Basics to Real-World Applications
This article provides a comprehensive guide to Python's regular expression (re) module, covering fundamental concepts, key functions such as match, sub and compile, flag modifiers, pattern syntax, practical code examples, and real‑world scraping use cases.
1. Introduction
Regular expressions are special character sequences that help you check whether a string matches a pattern. This article introduces Python's regular expression capabilities and links to a series of previous articles.
2. Overview
Python has included the re module since version 1.5, providing Perl‑style regex support. The module offers full regex functionality.
3. re.match Function
re.match(pattern, string, flags=0)attempts to match a pattern at the beginning of a string; if it fails, it returns None. Successful matches return a match object, from which you can retrieve groups using group(num) or groups(). Example output is shown in the images below.
4. Search and Replace
The re.sub function replaces matches in a string. Syntax: re.sub(pattern, repl, string, count=0, flags=0). Parameters: pattern (regex pattern), repl (replacement string or function), string (source text), count (max replacements, 0 means all), flags (optional regex flags). Example output is shown in the images below.
5. compile Function
re.compile(pattern[, flags])compiles a regex pattern for reuse with match and search. Flags include re.I (ignore case), re.L (locale‑aware), re.M (multiline), re.S (dot matches newline), re.U (Unicode), and re.X (verbose).
6. Regex Objects
re.RegexObjectis returned by re.compile. re.MatchObject provides methods such as group(), start(), end(), and span() to access match details.
7. Regex Flags (Optional Modifiers)
Flags modify matching behavior and can be combined with bitwise OR (e.g., re.I | re.M). The table below summarizes common flags:
re.I
Case‑insensitive matching
re.L
Locale‑aware matching
re.M
Multiline mode (affects ^ and $)
re.S
Dot matches all characters, including newline
re.U
Unicode‑aware character classes
re.X
Verbose mode (allows whitespace and comments)
8. Regex Pattern Syntax
Pattern strings use special syntax: literals match themselves, backslashes escape characters, and special symbols have meanings unless escaped. Raw strings (e.g., r'\t') are recommended to avoid double escaping.
9. Practical Application
Example: extracting movie information from a web page (e.g., Maoyan). A regex pattern is compiled to capture title, star, and release time, then applied to HTML content. Code snippet:
pattern = re.compile('<div>.*?title="(.*?)".*?class="star">(.*?)</p>.*?releasetime">(.*?)</p>', re.S)The pattern captures the desired fields inside specific tags.
10. Summary
1) Regular expressions are suitable for scenarios requiring extraction of multiple data items. 2) This article covered regex basics, core functions, pattern syntax, and practical examples, with references to earlier tutorial series for deeper learning. 3) Additional Python resources are available at the provided URL.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
