Fundamentals 8 min read

Understanding Greedy and Non‑Greedy Matching in Regular Expressions

This article explains the difference between greedy and non‑greedy (lazy) matching in regular expressions, describes how quantifiers behave by default, shows how to switch to lazy mode using a trailing question mark, and provides multiple Python code examples illustrating both approaches.

Test Development Learning Exchange

May 9, 2025

Regular expressions use quantifiers such as *, +, ?, and {} to specify how many characters to match. By default these quantifiers are greedy, meaning they try to consume as many characters as possible; adding a trailing question mark makes them non‑greedy (lazy), causing them to match the smallest possible portion.

Greedy Matching

In greedy mode the engine expands the match to the longest possible string that still satisfies the pattern. The following Python example demonstrates this behavior when searching for HTML‑like tags.

import re

text = "Here is some text with <tag1> and <tag2>."

pattern = r'<.*>'

match = re.search(pattern, text)

print(match.group())  # output: <tag1> and <tag2>

The pattern <.*> starts at the first '<' and continues until the last '>', capturing everything in between.

Non‑Greedy (Lazy) Matching

Appending ? after a quantifier forces the engine to stop as soon as the rest of the pattern can be satisfied. The example below extracts each tag individually.

import re

text = "Here is some text with <tag1> and <tag2>."

pattern = r'<.*?>'

matches = re.findall(pattern, text)

print(matches)  # output: ['<tag1>', '<tag2>']

Here the engine stops at the first closing '>', returning separate matches for each tag.

Key Points Summary

• Greedy quantifiers (*, +, ?, {n,m}) are the default and match as many characters as possible. • Non‑greedy quantifiers (*?, +?, ??, {n,m}?) match the minimal number of characters needed. • Choosing between them depends on the structure of the data you need to extract.

Additional Illustrative Examples

1. HTML tag matching (greedy vs lazy)

import re

text = "FirstSecond"

pattern_greedy = r'.*'

print(re.findall(pattern_greedy, text))  # ['FirstSecond']

pattern_lazy = r'.*?'

print(re.findall(pattern_lazy, text))  # ['F', 'i', 'r', 's', 't', 'S', 'e', 'c', 'o', 'n', 'd']

2. Matching repeated words

import re

text = "This is a test test sentence."

pattern_greedy = r"(\b\w+\b)\s+\1"

match = re.search(pattern_greedy, text)

print("Greedy:", match.group(0))  # Greedy: test test

pattern_lazy = r"(\b\w+\b)\s+?\1"

match = re.search(pattern_lazy, text)

print("Lazy:", match.group(0))  # Lazy: test test

Both patterns produce the same result here because the whitespace quantifier already matches minimally.

3. Extracting file names from paths

import re

path = "/home/user/documents/report.docx"

pattern_greedy = r".*/(.*)"

match = re.search(pattern_greedy, path)

print("Greedy file name:", match.group(1))  # report.docx

pattern_lazy = r".*?/(.*)"

match = re.search(pattern_lazy, path)

print("Lazy file name:", match.group(1))  # documents/report.docx

The greedy pattern captures everything after the last slash, while the lazy pattern stops at the first slash, demonstrating how the choice of quantifier affects the result.

Conclusion

Greedy matching (*, +, ?, {n,m}) is the default behavior and captures the longest possible substring; non‑greedy matching (*?, +?, ??, {n,m}?) captures the shortest possible substring. Understanding and selecting the appropriate mode allows precise control over pattern extraction in regular expressions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python regular expressions regex greedy matching non-greedy quantifiers

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.