Artificial Intelligence 18 min read

Ctrip Hotel Image Intelligence: From Pre‑Processing to Smart Applications

This article describes Ctrip's end‑to‑end hotel image intelligence platform, covering image pre‑audit, deduplication, watermark detection, quality enhancement, content classification, aesthetic assessment, and downstream applications such as smart display, image‑text integration, and automated video generation, all driven by computer‑vision and deep‑learning techniques.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Ctrip Hotel Image Intelligence: From Pre‑Processing to Smart Applications

Li Xiang, head of the Image Technology team in Ctrip's Data Intelligence division, focuses on computer vision and machine learning research, having published more than ten papers in conferences such as ICCV and CVPR.

Ctrip, as a leading OTA, manages billions of hotel images that grow by hundreds of thousands daily. To reduce manual effort and improve the speed, accuracy, and completeness of hotel information for users, Ctrip has built a comprehensive image‑intelligence pipeline.

The overall architecture of the hotel image intelligence system is shown below.

1. Image Intelligent Processing and Mining

Image processing is the foundation and includes three stages: image pre‑audit, quality enhancement, and information mining.

Image Pre‑Audit

The first step assists humans in efficiently reviewing massive image collections by automatically removing duplicate or near‑duplicate images, detecting watermarks, and filtering other non‑compliant visual content.

Similar Image Deduplication

Duplicate detection handles variations such as size deformation, cropping, color change, rotation, and viewpoint shift. The pipeline extracts image features (hand‑crafted like color, texture, HOG, SIFT, SURF, and deep features) and computes similarity using distances (Euclidean, Manhattan, cosine) or supervised metrics (LMNN, KISSME, LFDA, MFA). To meet industrial speed requirements, Ctrip adopts ORB features, which are binary and enable fast Hamming‑distance similarity calculation.

In practice, ORB is further optimized to improve scale invariance and robustness to blur, achieving speed comparable to SIFT/SURF while producing binary codes directly.

Image Watermark Detection

Watermarks are low‑visibility visual elements. The problem is cast as a single‑object detection task. Before deep learning, Deformable Part Models (DPM) were popular. After the deep learning era, a series of CNN‑based detectors (R‑CNN, SPPNet, Fast R‑CNN, Faster R‑CNN, SSD, YOLO/YOLO2) became mainstream.

Because labeling a large watermark dataset is costly, Ctrip built an automatic data‑generation and labeling pipeline, creating a diversified large‑scale watermark dataset with minimal human effort. Experiments on this dataset compared Faster R‑CNN, SSD, and YOLO2; the final detector is an improved YOLO2 model that achieves high accuracy on both seen and unseen watermarks.

2. Image Quality Enhancement

To improve visual quality, Ctrip applies techniques such as de‑blurring, small‑image upscaling, and intelligent beautification. For small‑image upscaling, super‑resolution methods are used.

Traditional interpolation produces jagged edges; classic sparse‑representation methods (SR, ANR, SF, A+) rely on LR‑HR patch pairs. Deep learning approaches (SRCNN, DRCN, VDSR, SRResNet, SRGAN, SRDenseNet) learn end‑to‑end mappings. Ctrip selects VDSR as the backbone and trains a multi‑scale model using mixed‑resolution samples.

Two practical issues arise: (1) MSE loss yields high PSNR but overly smooth textures; (2) Real low‑resolution hotel images often contain compression artifacts, which can be amplified by naive super‑resolution.

To address these, Ctrip enhances VDSR with perceptual loss components and artifact‑suppression techniques, achieving more natural results and reducing blockiness.

3. Image Information Mining

Mining extracts rich semantic information from each hotel image, laying the groundwork for downstream applications. It includes content classification, multi‑object detection, and quality assessment.

Image Content Classification

Deep CNNs (AlexNet, VGG, ResNet, DenseNet, Inception) are fine‑tuned on a domain‑specific dataset. Because large‑scale manual labeling is expensive, Ctrip uses transfer learning: a VGG model pre‑trained on a natural‑scene dataset similar to hotel images is fine‑tuned with data augmentation (horizontal flip, random crop, color jitter). The resulting classifier distinguishes more than ten hotel‑image categories with high accuracy.

Image Quality Assessment

Beyond sharpness, aesthetic quality is evaluated. The problem is reformulated as a binary “good‑looking vs. bad‑looking” classification; the probability of the “good” class serves as an aesthetic score. A ResNet backbone provides deep features, and a linear SVM is trained on a manually annotated set (multiple reviewers per image). This model yields a quality score used in later applications.

4. Image Intelligent Applications

Smart applications create value for users and hotels. They include smart display, image‑text integration, and automatic video generation.

Smart Display

First‑image selection and ranking combine resolution, content type, clarity, and aesthetic scores into a unified model. The model dramatically improves the quality of primary hotel and room images, leading to higher conversion rates.

Image‑Text Integration

Automatic captioning using an ImageCaption model produced stiff descriptions. Ctrip enhanced this by injecting real user review text, yielding more natural, sentiment‑rich captions that accompany images.

From Image to Video

To meet growing demand for video, Ctrip automatically generates hotel videos by selecting representative images (using the mined information) and synchronizing them with textual subtitles derived from hotel descriptions.

These videos have attracted tens of thousands of daily views, significantly boosting booking conversion and night‑stay metrics.

5. Summary and Outlook

Ctrip has presented a series of real‑world image‑intelligence cases, illustrating the journey from zero to one in hotel image automation. Future work will continue to explore deeper computer‑vision and machine‑learning applications across more hotel‑related scenarios.

Machine Learningcomputer visionDeep Learningimage processinghotel industry
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.