Artificial Intelligence 14 min read

Tourism Spot Recommendation System: Framework, Model Construction, Feature Engineering, and Performance Evaluation

This article describes a tourism recommendation system that addresses data sparsity, seasonality, and geographic variations by using an offline‑online architecture, GBDT+LR CTR prediction, exponential decay scoring, and extensive feature engineering, achieving a 1.6% conversion‑rate increase and high accuracy and recall.

Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tourism Spot Recommendation System: Framework, Model Construction, Feature Engineering, and Performance Evaluation

With the rise of mobile internet, online ticket sales have become the main channel for scenic spots, reaching 94.8% online payment and 5.2% on‑site payment by December 2016. To help users quickly find attractive attractions, a high‑precision, diverse recommendation system is urgently needed.

The tourism scenario presents several challenges: (1) data sparsity because purchase frequency is low; (2) strong seasonality (e.g., hot‑spring in winter, water parks in summer); (3) large differences between local and non‑local users; (4) city‑level interest shifts when users change cities.

Recommendation Framework

The system consists of an offline layer (data cleaning, tag cleaning, feature extraction, model training) and an online layer (real‑time user behavior collection, preference inference, candidate generation, scoring and re‑ranking). Large‑scale features such as user profiles are stored in ElasticSearch for fast online queries, while small features (e.g., city attributes) are broadcast to nodes, keeping end‑to‑end latency under 100 ms.

Model Construction

Because most users have only one order within six months, traditional user‑based collaborative filtering performs poorly. Experiments with item‑based collaborative filtering and Latent Factor Model (LFM) on the TOP‑10 recommendation list yielded the following results:

Model

Precision

Recall

Item‑based CF

0.45%

17.07%

LFM

0.0263%

4.68%

The extremely low purchase frequency makes the interaction matrix too sparse, so pure collaborative filtering is unsuitable.

CTR Prediction

Click‑Through Rate (CTR) prediction, a core component of online advertising, is adopted for recommendation. Logistic Regression (LR) is used as the base model, with extensive feature engineering to enhance non‑linear learning. Gradient Boosted Decision Trees (GBDT) provide powerful non‑linear modeling and automatically generate cross‑features that can be fed into LR, eliminating much manual feature combination work.

The GBDT+LR architecture works as follows: an input sample x traverses the trained GBDT trees, landing on leaf nodes that each correspond to a one‑dimensional LR feature; these features are then fed into LR to output the predicted click probability.

Candidate Set Generation

To reduce online computation, a two‑stage candidate set is built. First, popular spots (TOP n by sales score, adjusted by local‑non‑local purchase rate) form Candidate 1, ensuring coverage of hot items while preserving long‑tail potential. Second, real‑time user behavior is scored using an exponential decay formula:

CurrentScore = PreviousScore × exp(-cooling_coefficient × time_interval)

User‑spot interaction history and inferred category preferences generate Candidate 2, which reflects personal interests. The final candidate set merges both, achieving over 70% coverage while balancing popularity and personalization.

Feature Selection

Features include spot attributes (stable properties), user attributes (behavior, preferences, demographics), city attributes (regional interest patterns), and cross‑features such as local‑non‑local purchase rate combined with spot tags. preprocessing steps involve normalization (min‑max scaling), discretization of continuous attributes (e.g., city tier), and importance ranking via GBDT’s relative.influence function. No extensive manual cross‑features are required because GBDT already performs feature interaction.

Using the selected features, a balanced dataset (positive samples = clicks, negative samples = impressions without clicks) is trained with GBDT+LR. Experiments show that a tree depth of 3 with 100 trees yields 79.2% precision and 79.3% recall, providing a good trade‑off between accuracy and online latency.

Recommendation Effect

Addressing the four challenges, the system improves local vs. non‑local recommendations and adapts to city switches. For example, in Suzhou, local users receive spots with high local purchase rates (e.g., 88% for Lingyuan Temple), while non‑local users see attractions with higher non‑local rates (e.g., 64% for Mudu Ancient Town). When a user switches from Suzhou to Hangzhou, the top recommendations shift from cultural sites to natural scenery, demonstrating effective city‑level adaptation.

Conversion rate (orders per UV) increased by an average of 1.6% across weekdays and weekends, translating to roughly 16 additional orders per 1,000 visitors.

Summary and Future Work

The core of the system is a re‑ranking model that combines CTR prediction with scenario‑specific features to solve the four identified challenges. Future improvements include collecting more data (search logs, richer tags), incorporating user geolocation, refining scenario detection (e.g., business vs. leisure travel), and exploring deep‑wide models to capture deeper non‑linear feature interactions while addressing model size and latency through compression techniques.

GBDTmachine learningfeature engineeringCTR predictionrecommendation systemlogistic regressiontourism
Tongcheng Travel Technology Center
Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.