Fundamentals 10 min read

Python Regular Expressions: From Basics to Advanced Usage

This tutorial explains how to use Python's re module for regular expression operations, covering basic string matching, character classes, quantifiers, grouping, greedy vs. non‑greedy matching, substitution, and a practical example of extracting email addresses from text.

Python Programming Learning Circle

Nov 20, 2024

Python Regular Expressions: From Basics to Advanced Usage

Regular expressions are powerful tools for matching, searching, and replacing patterns in strings. Python's re module provides support for regular expressions. This article guides you from basic string matching to advanced pattern techniques and includes a real‑world email‑extraction case.

1. Import the re module

First, import the re module, which is the prerequisite for using regular expressions.

import re

2. Basic matching

The simplest regex directly matches a fixed string. For example, searching for the word "hello" in a text.

text = "Hello, world! Hello again."
pattern = "hello"

# Use re.search() to find the first match (ignore case)
match = re.search(pattern, text, re.IGNORECASE)
if match:
    print("Found match:", match.group())
else:
    print("No match found")

Output: Found match: Hello

3. Matching multiple characters

The dot . matches any single character except a newline. For example, matching a three‑character word where the second character can be anything.

text = "cat bat rat mat"
pattern = "c.t"

matches = re.findall(pattern, text)
print("All matches:", matches)

Output: All matches: ['cat']

4. Character sets

Square brackets [] define a character set, matching any one of the enclosed characters. For example, matching words that start with "a" or "e".

text = "apple elephant antelope"
pattern = "[ae]pple"

matches = re.findall(pattern, text)
print("All matches:", matches)

Output: All matches: ['apple']

5. Ranges

Use a hyphen - inside a character set to specify a range, such as all lowercase letters.

text = "abc123 def456 ghi789"
pattern = "[a-z]+"

matches = re.findall(pattern, text)
print("All matches:", matches)

Output: All matches: ['abc', 'def', 'ghi']

6. Repetition (quantifiers)

The quantifier {m,n} specifies the number of repetitions. Example: match words with at least two consecutive "a" characters.

text = "aa bb aaa ccc aaaa"
pattern = "a{2,}"

matches = re.findall(pattern, text)
print("All matches:", matches)

Output: All matches: ['aa', 'aaa', 'aaaa']

7. Zero or more ( * )

The asterisk * matches the preceding character zero or more times. Example: match "ab" followed by any number of "c".

text = "abc abccc ab abcccc"
pattern = "abc*"

matches = re.findall(pattern, text)
print("All matches:", matches)

Output: All matches: ['abc', 'abccc', 'ab', 'abcccc']

8. One or more ( + )

The plus + matches the preceding character one or more times. Example: match "ab" followed by at least one "c".

text = "abc abccc ab abcccc"
pattern = "abc+"

matches = re.findall(pattern, text)
print("All matches:", matches)

Output: All matches: ['abc', 'abccc', 'abcccc']

9. Zero or one ( ? )

The question mark ? matches the preceding character zero or one time. Example: match "abc" where the final "c" is optional.

text = "abc ab abc"
pattern = "abc?"

matches = re.findall(pattern, text)
print("All matches:", matches)

Output: All matches: ['abc', 'ab', 'abc']

10. Grouping

Parentheses () create groups, allowing extraction of sub‑patterns. Example: extract year, month, and day from a date string.

text = "Today is 2023-10-05"
pattern = "(\d{4})-(\d{2})-(\d{2})"

match = re.search(pattern, text)
if match:
    year, month, day = match.groups()
    print(f"Year: {year}, Month: {month}, Day: {day}")
else:
    print("No match found")

Output: Year: 2023, Month: 10, Day: 05

11. Non‑capturing groups

Non‑capturing groups (?:...) group without capturing. Example: match "http" or "https" without capturing the protocol.

text = "Visit https://example.com or http://example.org"
pattern = "(?:https?://)([a-zA-Z0-9.-]+)"

matches = re.findall(pattern, text)
print("All matches:", matches)

Output: All matches: ['example.com', 'example.org']

12. Greedy vs. non‑greedy matching

Quantifiers are greedy by default, matching as much as possible. Adding ? makes them non‑greedy. Example: matching HTML tags.

text = "<b>bold</b> and <i>italic</i>"
pattern = "<.*>"

greedy_matches = re.findall(pattern, text)
non_greedy_matches = re.findall("<.*?>", text)

print("Greedy matches:", greedy_matches)
print("Non‑greedy matches:", non_greedy_matches)

Output: Greedy matches: [' bold and italic '] Non‑greedy matches: [' ', ' ', ' ', ' ']

13. Substitution

Use re.sub() to replace matched substrings. Example: replace all spaces with underscores.

text = "Hello World This Is A Test"
pattern = " "

new_text = re.sub(pattern, "_", text)
print("Replaced string:", new_text)

Output: Replaced string: Hello_World_This_Is_A_Test

14. Practical case: Extract email addresses

Given a text containing multiple email addresses, extract them using a regex pattern.

text = """
Contact us at [email protected] for any inquiries.
You can also reach out to [email protected] or [email protected].
"""

# Define email regex pattern
pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}\b"

emails = re.findall(pattern, text, re.IGNORECASE)
print("Extracted email addresses:", emails)

Output: Extracted email addresses: ['[email protected]', '[email protected]', '[email protected]']

Summary

This article introduced how to use Python regular expressions for text matching, starting from simple string matches to complex pattern constructs and a real‑world email‑extraction example.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

programming regular expressions regex text matching re module

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.