Master Python’s re Module: Essential Regex Techniques Explained
This article provides a comprehensive guide to Python’s re module, covering regex definitions, common methods, special character sets, pattern‑matching functions, match object attributes, and practical code examples for tasks such as validating phone numbers, IP addresses, and HTML snippets.
re Regex Handling
Regex Definition
Regular expressions are logical formulas for operating on strings; they consist of predefined special characters and their combinations to form a rule string that expresses filtering logic for text.
Common regex methods
re.compile – compile a pattern into a regex object
pattern.match – match from the start of a string
pattern.search – find the first match anywhere
pattern.findall – return all matches
pattern.sub – replace matches
Special character set
Key metacharacters include
.(any character except newline),
^(start of string),
$(end of string),
*(zero or more repetitions, greedy),
+(one or more repetitions),
?(zero or one),
{m}(exactly m repetitions),
{m,n}(between m and n repetitions), and their non‑greedy variants
*?,
+?,
{m,n}?. Escape sequences such as
\d(digits),
\w(word characters),
\s(whitespace), and assertions like
\b(word boundary) are also essential.
Regex methods
re.compile(pattern, flags=0)
<code>>> comp = re.compile(r'\d+')
>>> ret = comp.match('123456')
>>> ret.group()
'123456'</code>Equivalent to:
<code>>> ret = re.match(r'\d+', '123456')</code>re.search(pattern, string, flags=0) Finds the first location where the pattern matches and returns a match object.
re.match(pattern, string, flags=0) Matches only at the beginning of the string.
re.fullmatch(pattern, string, flags=0) Matches the entire string.
re.split(pattern, string, maxsplit=0, flags=0)
<code>>> re.split(r'\W+', 'Words words wordS')
['Words', 'words', 'wordS']
>>> re.split(r'\W+', 'Words words wordS', 1)
['Words', 'words wordS']
>>> re.split(r'\d+', '1q2W3e4R', flags=re.IGNORECASE)
['', 'q', 'W', 'e', 'R']</code>re.findall(pattern, string, flags=0)
<code>>> re.findall(r'\d+', '123,456')
['123', '456']
>>> re.findall(r'(\d+)(\w+)', '123qw,werrc')
[('123', 'qw')]
>>> re.findall(r'(\d+)|(\w+)', '123qw,werrc')
[('123', ''), ('', 'qw'), ('', 'werrc')]</code>re.finditer(pattern, string, flags=0)
<code>>> for i in re.finditer(r'\d+', '123456'):
print(i.group())
123456</code>re.sub(pattern, repl, string, count=0, flags=0)
<code>>> re.sub(r'(\d+) (\w+)', r'\2 \1', '12345 asdfd')
'asdfd 12345'</code>If repl is a function, it receives a match object.
<code>>> def mat(m):
if m.group(2) == '1234':
return m.group(1)
else:
return '1234'
>>> re.sub(r'(\d+) (\d+)', mat, '123 1234qer')
'123qer'</code>re.subn(pattern, repl, string, count=0, flags=0)
<code>>> re.subn(r'(\d+) (\d+)', mat, 'as123 1234qer')
('as123qer', 1)</code>Match object
match.group([group1, …]) – returns the matched subgroup(s).
match.groups(default=None) – returns a tuple of all subgroups.
match.groupdict(default=None) – returns a dict of named groups.
match.start([group]) / match.end([group]) – start and end indices of a group.
match.span([group]) – tuple of (start, end) indices.
match.lastindex – index of the last matched group.
match.lastgroup – name of the last matched named group.
Simple examples
Match characters following "123":
<code>>> re.search(r'(?<=123)\w+', '123asd,wer').group(0)
'asd'</code>Match characters after "123" and before "_":
<code>>> re.search(r'(?<=123)\w+(?=_)', '123asd_123wer').group(0)
'asd'</code>Match mobile numbers:
<code>>> re.match(r'1[3,5,7,8]\d{9}|', '13573528479').group()
'13573528479'</code>Match telephone numbers:
<code>>> re.match(r'\d{3}-\d{8}|\d{4}-\d{7}', '0531-82866666').group()
'0531-8286666'</code>Match IP addresses:
<code>>> re.match(r'\d+\.\d+\.\d+\.\d+', '192.168.10.25').group()
'192.168.10.25'</code>Match NetEase email addresses:
<code>>> re.findall(r'\w+@163\.com|\w+@126\.com', '[email protected] [email protected]')
['[email protected]', '[email protected]']</code>Match HTML text:
<code>>> re.match(r'<(\w*)><(\w*)>.*</\2></\1>', '<body><h2>wahaha5354</h2></body>').group()
'<body><h2>wahaha5354</h2></body>'</code>Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.