Master Python Regex: Unlock $ and ? for Precise String Matching
This article explains Python regular expression special characters such as $ and ?, demonstrates greedy and non‑greedy matching with step‑by‑step examples and screenshots, and shows how to extract substrings correctly using appropriate patterns, helping readers master precise string extraction for web crawling.
Previously we introduced three special characters in Python regular expressions (^, . and *); today we continue with additional special characters.
1. The special character $ denotes the end of a string. For example, the pattern 3$ matches any string that ends with the digit 3.
2. The pattern .*3$ matches any characters ending with 3, producing a match when the input ends with 3. Changing the pattern to .*4$ looks for strings ending with 4, which yields no results for inputs that do not end with 4.
3. The special character ? activates non‑greedy (lazy) matching. By default, regex engines use greedy matching, which captures the longest possible substring.
4. In the pattern .*(p.*p).*, parentheses capture the substring between two p characters. This pattern greedily matches the longest segment between the first and last p, resulting in an unexpected pp output.
5. Switching to a non‑greedy mode by adding ? after the first .* ( .*?(p.*p).*) makes the engine start matching from the left, producing a result closer to the desired output, though an extra p may still appear due to the second p remaining greedy.
6. Making both p sections non‑greedy ( .*?(p.*?p).*) yields the expected match, as the engine now processes the pattern from left to right with lazy quantifiers.
7. Understanding non‑greedy mode clarifies regex behavior; for example, a pattern can return pcccp when lazy matching is applied.
8. Combining greedy and non‑greedy quantifiers can produce results such as pcccpcccccccpppp, demonstrating how mixed modes affect extraction.
Non‑greedy matching is crucial for accurate string extraction in web crawling; mastering the use of $ and ? ensures reliable data retrieval.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
