Mastering Java Regular Expressions: Theory, Syntax, and Practical Examples
This article provides a comprehensive guide to Java regular expressions, covering their theoretical background, essential syntax elements, quantifiers, boundary matches, logical operators, and a wide range of practical examples for validation, replacement, splitting, and pattern matching of strings.
1. Theory
Regular expressions are used in virtually every development project for validation, splitting, and replacement tasks. They were introduced in Java 1.4 via the java.util.regex package, which contains the Pattern and Matcher classes.
2. Core Syntax Elements
2.1 Characters
Single characters are represented directly, e.g., a matches the letter "a". The backslash \ escapes special characters, \t matches a tab, and \n matches a newline.
2.2 Character Classes
[abc]matches any one of the characters a, b, or c. [^abc] matches any character except a, b, or c. [a-zA-Z] matches any alphabetic character, upper or lower case. [^a-zA-Z] matches any non‑alphabetic character.
2.3 Shorthand Character Sets
\d– any digit (equivalent to [0-9]). \D – any non‑digit. \w – word characters (letters, digits, underscore). \W – non‑word characters. \s – whitespace characters (space, tab, newline). \S – non‑whitespace characters.
2.4 Boundary Anchors
^asserts the start of a line, and $ asserts the end of a line (commonly used in JavaScript, optional in Java).
2.5 Quantifiers
?– 0 or 1 occurrence. + – 1 or more occurrences. * – 0 or more occurrences. {n} – exactly n occurrences. {n,m} – between n and m occurrences.
2.6 Logical Operators
XY– pattern X followed immediately by pattern Y. X|Y – either pattern X or pattern Y. (...) – groups sub‑patterns.
3. Practical String Operations Using Regular Expressions
The article demonstrates several real‑world scenarios, each illustrated with code snippets (shown as images in the original source).
3.1 Replacement Example
Goal: Remove all non‑alphabetic characters from a string. The regular expression [^a-zA-Z] is used with String.replaceAll to keep only letters.
3.2 Splitting Example
Goal: Split a string by digits. The pattern \d+ is applied to String.split. The article notes common pitfalls when the split result contains empty strings.
3.3 Numeric Validation Example
Goal: Validate a decimal number such as 10.2. The pattern \d+\.\d+ ensures a digit sequence before and after a literal dot. Edge cases like 10. are discussed and shown to be invalid under strict validation.
3.4 Date Matching Example
Goal: Convert a string to a date using SimpleDateFormat. The regular expression must enforce a specific date format (e.g., \d{4}-\d{2}-\d{2}) before parsing.
3.5 Phone Number Matching Example
Goal: Validate Chinese phone numbers. Two approaches are presented: a simple length check (7‑8 digits) and a more precise pattern that handles optional area codes and hyphens, e.g., ^(0\d{2,3}-)?\d{7,8}$.
3.6 Email Validation Example
Two levels of validation are shown:
Simple check: ^[A-Za-z0-9_]+@[A-Za-z0-9_]+\.[A-Za-z]{2,}$.
Full validation: username may contain letters, digits, underscores, hyphens, and dots, must start with a letter, length 5‑30, and domain suffix limited to common TLDs (e.g., com, net, org).
Corresponding regular expressions are illustrated in the images.
4. java.util.regex API Overview
Although String methods cover most use‑cases, the article briefly mentions the Pattern and Matcher classes for advanced scenarios such as group extraction. Matcher.matches() – full‑string match. Matcher.replaceAll() – replace all occurrences.
For typical validation and simple replacements, using String methods is recommended.
5. Summary
Regular expressions provide powerful capabilities for string validation and manipulation across programming languages. Mastering the core symbols, quantifiers, and API methods enables developers to handle complex parsing tasks efficiently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
