How AI Cut Hotel Review Moderation from 8 Hours to 2 Seconds

This article details how a leading OTA transformed its hotel review pipeline with multimodal large‑language models, real‑time event‑driven architecture, and automated static‑info correction, achieving sub‑second moderation, 99.6% accuracy, and measurable cost and user‑experience gains.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
How AI Cut Hotel Review Moderation from 8 Hours to 2 Seconds

Introduction

In online travel agencies, hotel user‑generated content (UGC) is a primary conversion driver. The platform receives tens of thousands of new reviews per day, making the traditional "human review + keyword filter" workflow infeasible: average latency is 8 hours, manual effort is high, and valuable signals in negative reviews are lost.

AI‑driven End‑to‑End Solution

The review lifecycle (ingestion → moderation → presentation → feedback) was rebuilt with large language models (LLMs) and three core applications:

Quality Guard – AI‑based second‑level moderation that replaces manual review.

Decision Acceleration – Multimodal AI summarization that produces a global text summary, scene‑tag summaries, and a curated photo album.

Value Loop – Automatic correction of static hotel information extracted from negative reviews.

1. Quality Guard – Technical Details

1.1 Business Challenges

Manual moderation suffered from limited staff, an 8‑hour latency, inconsistent rule execution across 60+ policies, and a serial pipeline that added tens of seconds per review. Peak traffic could generate >10 k pending reviews.

1.2 Two‑Step Architecture Reconstruction

Step 1: AI model replacement – Deploy multimodal LLMs that scale elastically, enforce 100 % rule consistency, and run 24/7.

Step 2: System upgrade – Switch from hourly batch pulls to real‑time MQ push; keep core checks (anti‑fraud, keyword, blacklist) synchronous (≤50 ms) and move heavyweight checks (image moiré, video safety, deep risk) to asynchronous callbacks.

1.3 Performance Optimizations

Concurrency increased tenfold, supporting >500 requests / second.

Image audit time reduced from 3 s to ≤500 ms (P50) and ≤2 s (P99).

1.4 Evaluation Results

Review latency: 8 h → 2 s; 99 % of reviews exposed within 1 s.

Accuracy: 99.6 % (error < 0.4 %, violation leakage < 0.1 %).

Operational cost: massive reduction in manual labor; reviewers redeployed to higher‑value tasks.

Peak handling: sustained 5× traffic surge with 99.99 % stability.

Latency reduction diagram
Latency reduction diagram

2. Decision Acceleration – Multimodal Summary

2.1 Pipeline Overview

Sentiment extraction and label tagging from raw reviews.

LLM‑based text summarization that generates:

Global overview (≈10 s reading time).

Scene‑specific tag summaries.

Curated photo album.

Cache layer and front‑end integration for sub‑second retrieval.

2.2 Image Summarization

All review images receive a quality score and are classified into 17 categories (room, pool, lobby, etc.).

Top‑10 images per category form the "selected album"; the highest‑scoring three categories provide default cover images.

2.3 Evaluation Metrics

In a 100‑hotel pilot, AI summaries passed quality checks in 88.6 % of cases, reduced user reading effort, increased page dwell time, and lifted conversion rates.

Summary quality metrics
Summary quality metrics

2.4 Summary Generation Rules (excerpt)

✅ 1. 观点辨识度:明确区分“大多数用户”“部分用户”及总结后的观点与体验,不混淆群体反馈;
✅ 2. 真实性:严格基于真实用户评论,不虚构、不夸大任何体验细节;
✅ 3. 故事性:提炼评论中的故事性元素,增强总结吸引力;
✅ 4. 情感色彩:传递真实评论情感,采用贴近用户日常交流的口吻;
🔧 内容调整规范:
🔸 1. 去除重复:自动检测并删除、合并重复表述;
🔸 2. 语言润色:保证语言自然流畅,贴合真实用户口吻;
🔸 3. 突出重点:优先强调评论中最核心的反馈点;
🔸 4. 增加过渡:正负向情感、不同用户群体观点间添加适配过渡语;
⚠️ 禁忌事项:
❌ - 不提及人名、英文名、代称,不使用第一人称;
❌ - 不过度强调负面评论,仅突出普遍且重要的负面反馈。

3. Value Loop – Static‑Info Correction from Negative Reviews

3.1 Problem Statement

Static hotel attributes (e.g., presence of a window, parking fees) often contain errors that mislead users and generate complaints. Manual verification is costly and slow.

3.2 AI‑Powered Correction Workflow

Feature extraction – LLM identifies contradictory statements in negative reviews (e.g., "listed as having a window, but none present").

Automated reconciliation – Extracted features are compared against the hotel base‑info database.

Closed‑loop governance – When a mismatch is detected with >85 % confidence, an automatic ticket is created for the merchant or operations team to update the data.

Key attributes (windows, air‑conditioning, deposit, heating, parking, surrounding environment, area, bed type) achieved >90 % detection accuracy.

3.3 Impact

Manual effort reduced from 15 person‑days/day to 0.5 person‑days/day.

Improved User Retention Sentiment (URS) and User Perceived Score (UPS) correlated with corrected information.

Correction workflow diagram
Correction workflow diagram

4. Summary and Outlook

The AI‑driven transformation delivered three major gains:

Sub‑second moderation with 99.6 % accuracy.

Multimodal summarization that cuts user decision time by >60 %.

Automated static‑info correction that saves massive manual effort and lifts user satisfaction metrics.

Future work will focus on further reducing bias and latency in moderation and summarization pipelines, deepening LLM integration for personalized recommendations, and strengthening monitoring and risk‑control frameworks to support scalable, repeatable deployments.

LLMOperational EfficiencyAI moderationmultimodal summarizationhotel reviews
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.