Fundamentals 21 min read

Master Python Regular Expressions: Patterns, Methods, and Real‑World Examples

This article introduces Python's regular expression engine, explains the syntax of regex patterns, lists special metacharacters, and demonstrates core re module functions such as compile, match, search, findall, finditer, split, and sub with practical code examples.

MaGe Linux Operations

May 31, 2018

Python Regular Expressions

Regular expressions are special character sequences that let you test whether a string matches a particular pattern. Since Python 1.5, the re module provides Perl‑style regex support, giving Python full regex capabilities.

Regex Pattern Syntax

Pattern strings use a special syntax. Literal letters and digits match themselves, while escaped characters gain special meanings. Metacharacters include: ^ – start of string $ – end of string . – any character (except newline, unless re.DOTALL is set) [...] – character class, e.g., [amk] matches a, m or

[^...]

– negated character class re* – zero or more repetitions re+ – one or more repetitions re? – optional (non‑greedy) re{n} – exactly n repetitions re{n,} – n or more repetitions re{n,m} – between n and m repetitions (greedy) a|b – alternation (re) – grouping (?imx) – inline flags (ignore case, multiline, verbose) (?-imx) – turn off flags (?:re) – non‑capturing group (?=re) – positive look‑ahead (?!re) – negative look‑ahead \w – word character (letters, digits, underscore) \W – non‑word character \s – whitespace \S – non‑whitespace \d – digit \D – non‑digit \b – word boundary \B – non‑word boundary

re Module Overview

The re module compiles a pattern into a regex object, which can then be used for matching and substitution. Compiling once improves performance compared with using the module‑level functions directly.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2018/4/29 22:02
# @Author  : Feng Xiaoqing
# @File    : test.py
# @Function: -----------
import re
import timeit

print(timeit.timeit(setup='''import re; reg = re.compile('<(?P<tagname>\w*)>.*</(?P=tagname)>')''', stmt='''reg.match('<h1>xxx</h1>')''', number=1000000))
print(timeit.timeit(setup='''import re''', stmt='''re.match('<(?P<tagname>\w*)>.*</(?P=tagname)>', '<h1>xxx</h1>')''', number=1000000))

reg = re.compile('<(?P<tagname>\w*)>.*</(?P=tagname)>')
reg.match('<h1>xxx</h1>')

Typical output shows the compiled version is faster:

0.4229613832757271 1.0246964437151256

re.compile(pattern[, flags])

Common flags: re.I – ignore case re.M – multiline (changes ^ and $) re.S – dot matches newline re.L – locale‑dependent character classes re.U – Unicode‑aware character classes re.X – verbose mode (allows whitespace and comments)

Example:

import re
pattern = re.compile(r'\d+')  # match one or more digits
m = pattern.match('one12twothree34four')
print(m)               # None – no match at start
m = pattern.match('one12twothree34four', 3, 10)
print(m)               # <_sre.SRE_Match object ...>
print(m.group())       # '12'
print(m.start())       # 3
print(m.end())         # 5
print(m.span())        # (3, 5)

re.match

re.match(pattern, string, flags=0)

attempts to match at the beginning of the string. Returns a match object or None.

re.search

re.search(pattern, string, flags=0)

scans the whole string and returns the first match.

re.findall

Returns a list of all non‑overlapping matches:

import re
pattern = re.compile(r'\d+')
print(pattern.findall('runoob 123 google 456'))  # ['123', '456']
print(pattern.findall('run88oob123google456', 0, 10))  # ['88', '12']

re.finditer

Returns an iterator yielding match objects:

import re
for m in re.finditer(r"\d+", "12a32bc43jf3"):
    print(m.group())
# 12
# 32
# 43
# 3

re.split

Splits a string by the occurrences of a pattern:

import re
print(re.split(r'\W+', 'runoob, runoob, runoob.'))
# ['runoob', 'runoob', 'runoob', '']
print(re.split(r'(\W+)', ' runoob, runoob, runoob.'))
# ['', ' ', 'runoob', ', ', 'runoob', ', ', 'runoob', '.', '']

re.sub

Replaces matches with a replacement string or function:

import re
phone = "2004-959-559 # 这是一个国外电话号码"
num = re.sub(r'#.*$', "", phone)
print("电话号码是:", num)          # 2004-959-559 
num = re.sub(r'\D', "", phone)
print("电话号码是 :", num)          # 2004959559

Key Takeaways

Use re.compile to create a regex object for repeated operations, choose appropriate flags for case‑insensitivity or multiline handling, and select the right function ( match, search, findall, finditer, split, sub) based on the task.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Code examples regular expressions Pattern Matching re module

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.