Intelligent Generation of Search Engine Advertising Keywords: Methods, Frameworks, and Future Directions
This article presents a comprehensive overview of automated techniques for generating high‑quality search engine advertising keywords, covering background, traditional manual methods, intelligent keyword expansion using NLP, segmentation, POS tagging, BILSTM‑CRF, BERT classification, semantic matching with DSSM, and additional approaches such as query suggestion and synonym rewriting.
The rapid international expansion of Ctrip has led to extensive overseas search‑engine advertising, where effective keyword selection, pricing, and creative design are crucial for improving ROI.
Traditional keyword generation relies on manual brainstorming (expansion) and mining existing search queries (catching), both of which are labor‑intensive and lack fine‑grained targeting.
To address these issues, an intelligent keyword generation pipeline is proposed, consisting of three core modules:
1. Product Information Supply Module – Stores product data (e.g., hotel, flight, city accommodation) and performs cleaning, tokenization, and part‑of‑speech (POS) tagging. Ambiguities in geographic entities are resolved using a Geohash‑based structured dictionary, while insufficient dictionary coverage is mitigated with data augmentation and a BILSTM‑CRF model.
2. Search Habit Summarization Module – Analyzes user search queries to extract common search patterns. It employs named‑entity recognition, tokenization, and POS tagging to map queries such as “Shanghai hotel accommodation” or “Hongqiao Airport inn discount” to structured entities (city, hotel name, demand terms).
3. Keyword Generation Module – Generates candidate keywords from product data and the derived rules, then filters ambiguous terms using three strategies: (a) string‑match overlap, (b) click‑through distribution across multiple products, and (c) semantic similarity scores computed by a DSSM model.
For the “catching” (keyword mining) scenario, a binary classification model fine‑tuned on BERT determines whether a query is accommodation‑related. Subsequent intent recognition treats the problem as semantic matching between query and product, using a two‑stage approach: offline DSSM recall followed by BERT‑based re‑ranking.
Additional explored methods include:
• Query‑suggestion based keyword generation, leveraging popularity, relevance, and diversity of suggested queries.
• Synonym rewriting techniques such as query rewriting, click‑graph based rewriting, and grammatical substitution to expand high‑performing keywords.
The article concludes with future work focusing on extending the system to more languages beyond Chinese, English, Japanese, and Korean, and investigating fully machine‑understood keyword generation that may produce non‑interpretable yet effective terms.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.