A Comprehensive Guide to Using Regular Expressions in Python
This article introduces Python's built‑in re module, explains how to import it, craft raw‑string patterns, and demonstrates common functions such as findall, match, search, sub, split, as well as compiling patterns, using match objects, flags, meta‑characters, and handling Unicode encoding for robust text processing.
In Python, regular expressions (regex) are handled through the built‑in re module, which provides functions for pattern creation, compilation, and various string operations such as matching, searching, replacing, and splitting.
1. Import the re module
import re2. Write a regex pattern
email_pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"Using a raw string (prefix r ) prevents backslash escaping issues.
3. Common re functions
findall() – returns all non‑overlapping matches as a list.
matches = re.findall(email_pattern, "Contact us at [email protected] or [email protected]")
print(matches) # Output: ['[email protected]', '[email protected]']match() – attempts to match a pattern at the start of a string.
result = re.match(r"Hello", "Hello world!")
if result:
print("Match found:", result.group())
else:
print("No match")search() – scans the entire string and returns the first match.
result = re.search(r"world", "Hello world!")
if result:
print("Found:", result.group()) # Output: Found: world
else:
print("Not found")sub() – replaces matched substrings with a new string.
new_text = re.sub(r"\d+", "number", "There are 123 apples and 456 oranges.")
print(new_text) # Output: There are number apples and number oranges.split() – splits a string by the pattern and returns a list.
words = re.split(r"\W+", "Hello, how are you?")
print(words) # Output: ['Hello', 'how', 'are', 'you', '']4. Compile a regex for performance
compiled_pattern = re.compile(email_pattern)
matches = compiled_pattern.findall("Contact us at [email protected] or [email protected]")
print(matches) # Output: ['[email protected]', '[email protected]']5. Using match objects for more information
result = re.search(r"(\w+) (\w+)", "John Doe")
if result:
print("Full name:", result.group()) # Output: Full name: John Doe
print("First name:", result.group(1)) # Output: First name: John
print("Last name:", result.group(2)) # Output: Last name: Doe6. Flags to modify regex behavior
case_insensitive_match = re.search("hello", "Hello World!", flags=re.IGNORECASE)
if case_insensitive_match:
print("Case‑insensitive match found!")Common flags include re.IGNORECASE (or re.I ), re.MULTILINE ( re.M ), and re.DOTALL ( re.S ).
7. Regex meta‑characters and special sequences
Understanding symbols such as ^ , $ , . , * , + , ? , {m,n} , [] , () , | and sequences like \d , \s , \w is essential for building effective patterns.
8. Handling Chinese characters and file encoding
Save Python scripts in UTF‑8 encoding, ensure the terminal supports UTF‑8, and optionally add # -*- coding: utf-8 -*- at the top of the file to avoid character display issues.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.