Databases 9 min read

Fuzzy Company Name Matching Using MySQL Regular Expressions

The article describes a MySQL‑based fuzzy matching solution for preventing duplicate company entries in a business‑approval workflow, detailing preprocessing of Chinese company names, word segmentation with IKAnalyzer, RegExp pattern generation, and a custom SQL query that ranks results by keyword‑match ratio without using external search engines.

Java Tech Enthusiast

Jul 8, 2024

Fuzzy Company Name Matching Using MySQL Regular Expressions

This article presents a business scenario involving a company approval workflow with two roles: business and administrator. The core requirement is to prevent duplicate company entries by matching company names using fuzzy matching techniques.

The technical implementation focuses on three key steps: extracting company name key information, performing word segmentation, and matching against existing database records. Due to the small system scale, the solution avoids introducing Elasticsearch and instead leverages MySQL's built-in capabilities.

MySQL provides three fuzzy search methods: LIKE matching (requires exact field match), RegExp regular expression matching (requires pattern presence), and Fulltext indexing (requires specific column types). For this use case, RegExp matching is chosen due to its flexibility in pattern matching, despite slightly lower performance compared to full-text indexing.

The implementation includes preprocessing company names by removing administrative regions, parentheses, and company-related keywords like "Group", "Shares", "Limited", etc. A comprehensive address utility class handles Chinese administrative divisions. The IKAnalyzer is integrated for Chinese word segmentation, with configuration in pom.xml and supporting classes.

The matching logic uses regular expressions to find similar company names and sorts results by match frequency using a custom SQL query that calculates the ratio of matched keywords to total name length. This ensures companies with more matching keywords appear first in the results.

Core code demonstrates preprocessing, word segmentation, and matching implementation, with detailed SQL queries showing how to achieve fuzzy matching and sorting by relevance within MySQL.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Chinese Word Segmentation company name validation fuzzy-matching regular expressions

Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.