Fundamentals 9 min read

Avoid Catastrophic Backtracking: How Regex Can Crash Your Server

A detailed exploration of how a complex regular expression used for shop‑name validation caused CPU spikes due to catastrophic backtracking, explaining DFA vs NFA engines, greedy, lazy and possessive quantifiers, and offering practical tips to write performant regexes.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Avoid Catastrophic Backtracking: How Regex Can Crash Your Server

1. Origin of the Issue

During development of a self‑service registration feature for Lazada Seller Center, a complex shop‑name validation rule was created, allowing letters, digits, Vietnamese characters and special symbols such as “&”, “-”, “_”. The author wrote a long regular expression to enforce it.

<code>^([A-Za-z0-9._()&'\- ]|[aAàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬbBcCdDđĐeEèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆfFgGhHiIìÌỉỈĩĨíÍịỊjJkKlLmMnNoOòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢpPqQrRsStTuUùÙủỦũŨúÚụỤưƯừỪửỬữỮứỨựỰvVwWxXyYỳỲỷỶỹỸýÝỵỴzZ])+$</code>

In the test environment the regex worked, but after deployment to production in Malaysia the CPU spiked to 100 % and the site became sluggish. Thread dumps showed all threads stuck in the regex validation.

2. Regex Engines

Regex engines are mainly of two types: DFA (deterministic finite automaton) and NFA (nondeterministic finite automaton). DFA processes the input text left‑to‑right, never backtracks, and runs in polynomial time, but supports few features. NFA drives the matching from the pattern, may backtrack, and can have exponential worst‑case complexity.

Example

<code>text = 'after tonight' regex = 'to(nite|nighta|night)' </code>

Explanation of NFA vs DFA matching steps omitted for brevity.

3. Backtracking

The catastrophic case occurs when the engine repeatedly tries alternatives and rolls back, consuming large CPU cycles.

4. Greedy, Lazy and Possessive Quantifiers

Standard quantifiers ( ?, +, *, {min,max}) are greedy by default, trying to match as much as possible. Adding ? makes them lazy (minimal match). Adding + after a quantifier makes it possessive, which disables backtracking.

<code>ab{1,3}c # greedy ab{1,3}?c # lazy ab{1,3}+bc # possessive </code>

Diagrams illustrating greedy vs lazy vs possessive matching are shown below.

5. Summary

The original long regex can be simplified to ^[allowed‑characters]+, where the allowed set contains about 250 characters. In NFA greedy mode, matching an excessively long input can trigger massive backtracking and CPU exhaustion. Switching to a possessive quantifier resolved the 100 % CPU issue. When writing regexes, always consider performance implications in addition to functional correctness.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

NFAregexpossessive quantifierDFAcatastrophic backtrackinggreedy quantifierlazy quantifier
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.