Fundamentals 13 min read

Master Python’s re Module: Essential Regex Techniques Explained

This article provides a comprehensive guide to Python’s re module, covering regex definitions, common methods, special character sets, pattern‑matching functions, match object attributes, and practical code examples for tasks such as validating phone numbers, IP addresses, and HTML snippets.

Ops Development Stories

Feb 24, 2019

Master Python’s re Module: Essential Regex Techniques Explained

re Regex Handling

Regex Definition

Regular expressions are logical formulas for operating on strings; they consist of predefined special characters and their combinations to form a rule string that expresses filtering logic for text.

Common regex methods

re.compile – compile a pattern into a regex object

pattern.match – match from the start of a string

pattern.search – find the first match anywhere

pattern.findall – return all matches

pattern.sub – replace matches

Special character set

Key metacharacters include . (any character except newline), ^ (start of string), $ (end of string), * (zero or more repetitions, greedy), + (one or more repetitions), ? (zero or one), {m} (exactly m repetitions), {m,n} (between m and n repetitions), and their non‑greedy variants *?, +?, {m,n}?. Escape sequences such as \d (digits), \w (word characters), \s (whitespace), and assertions like \b (word boundary) are also essential.

Regex methods

re.compile(pattern, flags=0)

>> comp = re.compile(r'\d+')
>>> ret = comp.match('123456')
>>> ret.group()
'123456'

Equivalent to: >> ret = re.match(r'\d+', '123456') re.search(pattern, string, flags=0) Finds the first location where the pattern matches and returns a match object.

re.match(pattern, string, flags=0) Matches only at the beginning of the string.

re.fullmatch(pattern, string, flags=0) Matches the entire string.

re.split(pattern, string, maxsplit=0, flags=0)

>> re.split(r'\W+', 'Words words wordS')
['Words', 'words', 'wordS']
>>> re.split(r'\W+', 'Words words wordS', 1)
['Words', 'words wordS']
>>> re.split(r'\d+', '1q2W3e4R', flags=re.IGNORECASE)
['', 'q', 'W', 'e', 'R']

re.findall(pattern, string, flags=0)

>> re.findall(r'\d+', '123,456')
['123', '456']
>>> re.findall(r'(\d+)(\w+)', '123qw,werrc')
[('123', 'qw')]
>>> re.findall(r'(\d+)|(\w+)', '123qw,werrc')
[('123', ''), ('', 'qw'), ('', 'werrc')]

re.finditer(pattern, string, flags=0)

>> for i in re.finditer(r'\d+', '123456'):
    print(i.group())
123456

re.sub(pattern, repl, string, count=0, flags=0)

>> re.sub(r'(\d+) (\w+)', r'\2 \1', '12345 asdfd')
'asdfd 12345'

If repl is a function, it receives a match object.

>> def mat(m):
    if m.group(2) == '1234':
        return m.group(1)
    else:
        return '1234'
>>> re.sub(r'(\d+) (\d+)', mat, '123 1234qer')
'123qer'

re.subn(pattern, repl, string, count=0, flags=0)

>> re.subn(r'(\d+) (\d+)', mat, 'as123 1234qer')
('as123qer', 1)

Match object

match.group([group1, …]) – returns the matched subgroup(s).

match.groups(default=None) – returns a tuple of all subgroups.

match.groupdict(default=None) – returns a dict of named groups.

match.start([group]) / match.end([group]) – start and end indices of a group.

match.span([group]) – tuple of (start, end) indices.

match.lastindex – index of the last matched group.

match.lastgroup – name of the last matched named group.

Simple examples

Match characters following "123":

>> re.search(r'(?<=123)\w+', '123asd,wer').group(0)
'asd'

Match characters after "123" and before "_":

>> re.search(r'(?<=123)\w+(?=_)', '123asd_123wer').group(0)
'asd'

Match mobile numbers:

>> re.match(r'1[3,5,7,8]\d{9}|', '13573528479').group()
'13573528479'

Match telephone numbers:

>> re.match(r'\d{3}-\d{8}|\d{4}-\d{7}', '0531-82866666').group()
'0531-8286666'

Match IP addresses:

>> re.match(r'\d+\.\d+\.\d+\.\d+', '192.168.10.25').group()
'192.168.10.25'

Match NetEase email addresses:

>> re.findall(r'\w+@163\.com|\w+@126\.com', '[email protected] [email protected]')
['[email protected]', '[email protected]']

Match HTML text:

>> re.match(r'<(\w*)><(\w*)>.*</\2></\1>', '<body><h2>wahaha5354</h2></body>').group()
'<body><h2>wahaha5354</h2></body>'

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python regex pattern-matching re module String processing

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.