Artificial Intelligence 14 min read

Poetry Generation from Images: Design, Implementation, and Evaluation of Ctrip’s “Xiao Shi Ji” System

The article presents Ctrip’s “Xiao Shi Ji” system that combines large‑scale tourism knowledge graphs, image recognition, and deep‑learning‑based poetry generation to automatically compose Chinese classical poems from photos, evaluates its performance against human poets, and discusses the underlying AI techniques.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Poetry Generation from Images: Design, Implementation, and Evaluation of Ctrip’s “Xiao Shi Ji” System

In early 2017, Ctrip launched the “Xiao Shi Ji” (Little Poetry Machine) that can understand and appreciate user‑uploaded photos and generate classical Chinese poems that match the image’s scenery and mood using a massive knowledge base.

Evaluation with blind tests against human poets in Shanghai showed that the system reaches human‑level quality, with professional and public judges unable to reliably distinguish machine‑generated poems; it often ranked among the top entries.

The system also supports functions such as image‑based poem retrieval, acrostic poems, and tower poems, showcasing AI’s challenge to human creativity in tourism contexts.

1. Overall Process

The pipeline consists of three core modules: a tourism knowledge graph, image recognition, and a poetry‑generation engine (see Figure 4).

2. Knowledge Graph Construction

Data sources include Ctrip’s proprietary tourism data, user‑generated content (reviews, travel notes), and public resources such as Wikipedia and Baidu Baike. The data are categorized as unstructured (text), semi‑structured (large encyclopedic entries), and structured (tourism entities, hotel, itinerary, user intent).

Knowledge extraction employs NLP techniques—segmentation, POS tagging, dependency parsing, semantic role labeling, and NER (using CRF++ combined with dictionaries) to extract entities, relations, and topics (tf‑idf, chi‑square, TextRank, LDA). Fusion merges multi‑source entities via semantic and lexical similarity and custom weighting, followed by symbolic logical reasoning to infer new relationships.

3. Image Recognition

State‑of‑the‑art CNN models are used. Starting from LeNet‑5, the evolution through AlexNet, VGGNet, GoogLeNet, and ResNet is described. Ctrip adopts an Inception‑v3 model with transfer learning, training both high‑level and low‑level layers to cope with a relatively small, domain‑specific dataset, achieving 92.5% mAP on the internal tourism image set.

4. Poetry Generation Engine

Traditional statistical and rule‑based methods are combined with deep learning. RNN language models alleviate sparsity; encoder‑decoder frameworks with attention capture theme and context; hierarchical RNNs ensure global coherence. The system scores image‑theme relevance, plans topics, and uses a greedy plus local‑optimal two‑pass algorithm (or genetic algorithms) to generate verses that satisfy rhyme, fluency, and relevance.

5. Summary

The “Xiao Shi Ji” demonstrates a successful integration of large‑scale tourism knowledge graphs, computer‑vision image understanding, and AI‑driven poetry generation, achieving human‑comparable poetic quality. Future work will refine entity tagging, expand visual and knowledge coverage, and further optimize the generation engine for richer, more diverse poetic expressions.

artificial intelligencedeep learningnatural language processingImage Recognitionknowledge graphpoetry generation
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.