Master Python Regular Expressions: Patterns, Methods, and Real‑World Examples
This article introduces Python's regular expression engine, explains the syntax of regex patterns, lists special metacharacters, and demonstrates core re module functions such as compile, match, search, findall, finditer, split, and sub with practical code examples.
Python Regular Expressions
Regular expressions are special character sequences that let you test whether a string matches a particular pattern. Since Python 1.5, the re module provides Perl‑style regex support, giving Python full regex capabilities.
Regex Pattern Syntax
Pattern strings use a special syntax. Literal letters and digits match themselves, while escaped characters gain special meanings. Metacharacters include: ^ – start of string $ – end of string . – any character (except newline, unless re.DOTALL is set) [...] – character class, e.g., [amk] matches a, m or
k [^...]– negated character class re* – zero or more repetitions re+ – one or more repetitions re? – optional (non‑greedy) re{n} – exactly n repetitions re{n,} – n or more repetitions re{n,m} – between n and m repetitions (greedy) a|b – alternation (re) – grouping (?imx) – inline flags (ignore case, multiline, verbose) (?-imx) – turn off flags (?:re) – non‑capturing group (?=re) – positive look‑ahead (?!re) – negative look‑ahead \w – word character (letters, digits, underscore) \W – non‑word character \s – whitespace \S – non‑whitespace \d – digit \D – non‑digit \b – word boundary \B – non‑word boundary
re Module Overview
The re module compiles a pattern into a regex object, which can then be used for matching and substitution. Compiling once improves performance compared with using the module‑level functions directly.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2018/4/29 22:02
# @Author : Feng Xiaoqing
# @File : test.py
# @Function: -----------
import re
import timeit
print(timeit.timeit(setup='''import re; reg = re.compile('<(?P<tagname>\w*)>.*</(?P=tagname)>')''', stmt='''reg.match('<h1>xxx</h1>')''', number=1000000))
print(timeit.timeit(setup='''import re''', stmt='''re.match('<(?P<tagname>\w*)>.*</(?P=tagname)>', '<h1>xxx</h1>')''', number=1000000))
reg = re.compile('<(?P<tagname>\w*)>.*</(?P=tagname)>')
reg.match('<h1>xxx</h1>')Typical output shows the compiled version is faster:
0.4229613832757271 1.0246964437151256re.compile(pattern[, flags])
Common flags: re.I – ignore case re.M – multiline (changes ^ and $) re.S – dot matches newline re.L – locale‑dependent character classes re.U – Unicode‑aware character classes re.X – verbose mode (allows whitespace and comments)
Example:
import re
pattern = re.compile(r'\d+') # match one or more digits
m = pattern.match('one12twothree34four')
print(m) # None – no match at start
m = pattern.match('one12twothree34four', 3, 10)
print(m) # <_sre.SRE_Match object ...>
print(m.group()) # '12'
print(m.start()) # 3
print(m.end()) # 5
print(m.span()) # (3, 5)re.match
re.match(pattern, string, flags=0)attempts to match at the beginning of the string. Returns a match object or None.
re.search
re.search(pattern, string, flags=0)scans the whole string and returns the first match.
re.findall
Returns a list of all non‑overlapping matches:
import re
pattern = re.compile(r'\d+')
print(pattern.findall('runoob 123 google 456')) # ['123', '456']
print(pattern.findall('run88oob123google456', 0, 10)) # ['88', '12']re.finditer
Returns an iterator yielding match objects:
import re
for m in re.finditer(r"\d+", "12a32bc43jf3"):
print(m.group())
# 12
# 32
# 43
# 3re.split
Splits a string by the occurrences of a pattern:
import re
print(re.split(r'\W+', 'runoob, runoob, runoob.'))
# ['runoob', 'runoob', 'runoob', '']
print(re.split(r'(\W+)', ' runoob, runoob, runoob.'))
# ['', ' ', 'runoob', ', ', 'runoob', ', ', 'runoob', '.', '']re.sub
Replaces matches with a replacement string or function:
import re
phone = "2004-959-559 # 这是一个国外电话号码"
num = re.sub(r'#.*$', "", phone)
print("电话号码是:", num) # 2004-959-559
num = re.sub(r'\D', "", phone)
print("电话号码是 :", num) # 2004959559Key Takeaways
Use re.compile to create a regex object for repeated operations, choose appropriate flags for case‑insensitivity or multiline handling, and select the right function ( match, search, findall, finditer, split, sub) based on the task.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
