Master JavaScript Regular Expressions: From Basics to Advanced Patterns
This comprehensive guide explains what regular expressions are, their practical uses in data validation and string manipulation, the syntax of literals and constructors, key String and RegExp methods, character classes, quantifiers, anchors, groups, assertions, flags, and the behavior of JavaScript's NFA engine, all illustrated with clear code examples.
What is a Regular Expression
A regular expression (regex, regexp, egrep) is a pattern that describes a set of strings. It is used to search, match, replace, or extract substrings from text.
Common Uses
Data validation (e.g., email, phone numbers)
Complex search‑and‑replace operations
Extracting substrings based on a pattern
Basic Syntax
Ordinary characters match themselves. Meta‑characters have special meanings: . – any character except a newline \d – digit
[0-9] \D– non‑digit \w – word character
[A-Za-z0-9_] \W– non‑word character \s – whitespace \S – non‑whitespace \b – word boundary, \B – non‑boundary
To match a meta‑character literally, escape it with a backslash, e.g. \+ matches a plus sign.
Creating Regular Expressions in JavaScript
var r1 = new RegExp('xyz', 'i'); // constructor with pattern string
var r2 = new RegExp(/xyz/i); // constructor with literal RegExp
var r3 = /xyz/i; // literal syntax
// ES6 style – flags can be read via .flags
new RegExp(/abc/ig, 'i').flags; // "i"String Methods for Pattern Matching
search() – returns the index of the first match or -1. Does not support the global flag.
replace() – replaces matches. The first argument can be a string or a RegExp; the second argument can be a string or a function. Replacement patterns include $1 ‑ $99, $& (whole match), $` (text before match), $' (text after match), and $$ (literal $). Example:
'abc'.replace(/b/g, "{$$$`$&$'}"); // "a{$abc}c"match() – returns an array of matches. With the g flag it returns only the matched strings; without g it returns an array that also contains index and input properties.
split() – splits a string into an array using a RegExp or plain string as the delimiter.
RegExp Object Methods
exec() – executes a search and returns an array with match details (similar to String.match() for non‑global searches).
test() – returns true if the pattern matches the given string, otherwise false.
toString() – returns the source pattern as a string.
Quantifiers
Quantifiers specify how many times the preceding token may occur. {n} – exactly n times {n,} – at least n times {n,m} – between n and m times (inclusive) * – {0,} (greedy) + – {1,} (greedy) ? – {0,1} (greedy)
Appending ? makes a quantifier lazy (non‑greedy). Example:
var greedy = /a+/; // matches "aaa" in "aaab"
var lazy = /a+?/; // matches only the first "a"
'aaab'.match(greedy); // ["aaa"]
'aaab'.match(lazy); // ["a"]Anchors and Boundaries
^– start of string (or start of line with m flag) $ – end of string (or end of line with m flag) \b – word boundary \B – non‑word boundary
Groups and Assertions
Parentheses ( ) capture sub‑patterns. Non‑capturing groups use (?: ). Look‑ahead and look‑behind assertions: (?=pattern) – positive look‑ahead (?!pattern) – negative look‑ahead (?<=pattern) – positive look‑behind (ES2018+) (?<!pattern) – negative look‑behind (ES2018+)
var html = `<div class="o2">
<div class="o2_team">
<img src="img/logo.jpg" />
</div>
</div>`;
var re = /<(?!img)(?:.|\r|
)*?>/gi;
console.log(html.match(re));
// ["<div class=\"o2\">", "<div class=\"o2_team\">", "</div>", "</div>"]Backreferences
Inside the replacement string, \n (where n is a number) refers to the n ‑th captured group.
var re = /(Mike)(\1)(s)/;
var str = "MikeMikes";
console.log(str.replace(re, "$1$2'$3")); // "MikeMike's"Other Escape Sequences
\cx– control character (x must be A‑Z or a‑z) \xhh – two‑digit hexadecimal escape \uhhhh – four‑digit Unicode escape \n – newline (also used for backreference when followed by a digit) \r – carriage return \t – tab \v – vertical tab \f – form feed
Flags
i– case‑insensitive matching g – global search (find all matches) m – multiline mode; ^ and $ match line boundaries u – Unicode mode; correctly handles code points >
\uFFFF y– sticky mode; matches only at lastIndex position
// Unicode flag example
/^\uD83D/u.test('\uD83D\uDC2A'); // false (treated as a single surrogate pair)
/^\uD83D/.test('\uD83D\uDC2A'); // true
// Sticky flag example
/b/y.exec('aba'); // null (no match at position 0)
/b/.exec('aba'); // ["b"]Operator Precedence
Escape character \ Parentheses, non‑capturing groups, look‑arounds, character classes
Quantifiers * + ? {n} {n,} {n,m} Anchors ^ $ \b \B Alternation
|JavaScript RegExp Engine
JavaScript uses a nondeterministic finite automaton (NFA) engine. It matches greedily, prefers the leftmost alternative, and may backtrack, which can affect performance.
'nfa not'.match(/nfa|nfa not/); // ["nfa"] – leftmost alternative wins
"AB01CD23CD45CEff".match(/AB.*CD/); // ["AB01CD23CD"] – greedy .* consumes as much as possible then backtracksReferences
MDN Web Docs, W3Schools, http://es6.ruanyifeng.com/#docs/regex , http://imweb.io/topic/56e804ef1a5f05dc50643106 , http://www.cnblogs.com/deerchao/archive/2006/08/24/zhengzhe30fengzhongjiaocheng.html , http://www.cnblogs.com/hustskyking/p/how-regular-expressions-work.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Aotu Lab
Aotu Lab, founded in October 2015, is a front-end engineering team serving multi-platform products. The articles in this public account are intended to share and discuss technology, reflecting only the personal views of Aotu Lab members and not the official stance of JD.com Technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
