Improving International Hotel Room‑Type Merging with Text Similarity and Machine‑Learning Models

This article describes how a large‑scale international hotel platform reduced room‑type merging errors and user complaints by applying rule‑based methods, text‑similarity algorithms (Jaccard, LCS, N‑Gram) and supervised machine‑learning classifiers such as fastText to standardize and merge heterogeneous room‑type data.

Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Improving International Hotel Room‑Type Merging with Text Similarity and Machine‑Learning Models

Project Background International hotel operations face a major difficulty in consolidating basic information, especially merging room types, which historically suffers from high error rates and user complaints.

Existing Problems The data come from many sources, are inconsistent, massive (over 1.2 million overseas hotels, millions of daily updates), and manual maintenance is costly and error‑prone, leading to mismatches such as "double room" being incorrectly merged with "standard room".

Solution Overview Three iterative approaches were adopted:

1) Rule‑based method : extract keywords from room‑type strings and group them using manually crafted rules; high labor cost and limited coverage.

2) String similarity : build a keyword library from standard room types, compute vector similarity to match supplier room types; reduces labor but still produces false matches when texts are highly similar.

3) Machine‑learning model : train supervised classifiers (fastText and textCNN) on labeled data (≈10 k samples covering 160 room‑type categories) to automatically recognize room types, achieving higher accuracy with lower training time.

Data Preparation A standard room‑type library was constructed from hotel official websites; training samples were generated by annotating supplier data according to this standard.

Algorithm Model The pipeline includes:

Text processing : remove non‑room information, normalize English/Chinese scripts, eliminate stop words, apply synonym conversion, and unify numeric expressions.

Text similarity : compute Jaccard set similarity, Longest Common Subsequence (LCS), and N‑Gram checks; Jaccard similarity of 1 indicates a perfect match, while LCS handles ordered character similarity.

Text classification : a fastText classifier (three‑layer architecture: input, hidden, output) predicts room‑type categories for cases not resolved by similarity, followed by confidence‑threshold filtering and final standardization.

Model Effectiveness Offline tests on 500 hotels showed high merge rates and a 99 % manual verification accuracy; the new algorithm reduced complaint‑related compensation by 40 % and increased distribution volume.

Business Feedback After deployment, the upgraded room‑type merging system improved overall product quality, lowered error‑induced complaints, and contributed to higher revenue for the platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

text similarityfastTexthoteljaccardN-gramroom type merging
Tongcheng Travel Technology Center
Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.