How Alibaba Engineers Boost SEO with Reinforcement Learning and Attention Models
This article details Alibaba.com engineers' application of reinforcement learning, attention mechanisms, and weakly supervised techniques to extract product summaries, improve content quality, and significantly raise SEO rankings, supported by offline experiments, online A/B testing, and future research directions.
Background
SEO is a set of techniques that follow search‑engine guidelines to improve website rankings and attract more traffic. Traditional methods such as TDK optimization, link building, and mobile‑first design have been extensively applied at alibaba.com. As search‑engine algorithms increasingly prioritize genuine content value, pages with richer, high‑quality text receive higher rankings.
Figure 1 shows the importance ranking of SEO factors, highlighting content construction as a major contributor.
Our work focuses on content construction, specifically extracting product summary sentences to enrich the SEO landing‑list pages. Adding concise product descriptions increases page text, improves Google rankings, attracts users, and boosts click‑through rates.
Problem Description
Given a product and its description, we aim to extract suitable sentences as a product summary. Two main challenges arise: (1) lack of strictly labeled data to guide the model, and (2) noisy descriptions containing logistics, payment, and Q&A information that obscure the core product content.
Algorithm Exploration
Textrank Model
We treat summary extraction as an unsupervised task and use the classic Textrank algorithm as a baseline. Textrank computes semantic similarity between sentences, assigns a Textrank score, and selects the top‑N sentences. The similarity formula is shown below.
Manual review revealed that Textrank often selects noisy sentences (e.g., payment instructions) because the algorithm cannot distinguish category‑relevant content from unrelated information.
Attention Model
To address the unsupervised challenge, we introduce product‑category tags, converting the problem into weak supervision. A text‑classification model with an attention mechanism assigns higher weights to sentences that are more category‑relevant, making them better candidates for summaries.
The attention mechanism computes a similarity α between each sentence vector and a global category vector U, producing attention weights that guide sentence selection.
Although the attention model reduces some noise, it still cannot dynamically adjust the number of extracted sentences, and its performance gain over Textrank is modest.
Reinforcement Learning Model
We adopt a Selector‑Classifier architecture. The Selector Network chooses candidate summary sentences, while the Classifier Network evaluates them and returns a reward based on classification loss. The three networks—Encoder, Selector, and Classifier—are trained jointly.
The Encoder extracts features for each sentence (Vec1‑Vec4). The Selector outputs a probability that a sentence is category‑relevant. After selection, the Classifier computes cross‑entropy loss for category prediction and feeds it back as a reward to the Selector. If the Selector rejects all sentences, the average classification loss on the training set is used as the reward.
Experimental Analysis
We conducted two offline experiments.
Experiment 1: Noise Reduction
We replaced the original noisy dataset with summaries generated by each model and trained a text‑CNN classifier for category prediction. Results are compared with the Textrank baseline.
The full‑noise dataset yields only ~47.5% accuracy, while the reinforcement‑learning model reaches ~80%, demonstrating effective noise removal. The attention model slightly outperforms Textrank (+4% accuracy).
Experiment 2: Supervised Evaluation
We manually labeled 1,000 items and evaluated all algorithms.
The reinforcement‑learning model shows clear gains in Precision and F1, while Recall is slightly lower because it extracts fewer sentences (≈30% fewer) than the baselines.
Further Attention Analysis
Document‑level category classification tests reveal that the current attention mechanism does not provide significant improvements.
Future work may explore selective attention to better suit the large number of categories.
Online Results
A one‑month online A/B test shows a stable increase in UV after deploying the reinforcement‑learning summaries.
Outlook
Current models already demonstrate strong denoising capability and produce readable summaries with minimal noise. Combining this approach with seq2seq generation could create high‑quality English training data for product recommendation scenarios.
References
[1] Feng J, Huang M, Zhao L, et al. Reinforcement Learning for Relation Classification from Noisy Data, Proceedings of AAAI, 2018.
[2] Yang Z, Yang D, Dyer C, et al. Hierarchical Attention Networks for Document Classification, Proceedings of NAACL, 2016.
[3] Lin Y, Shen S, Liu Z, et al. Neural Relation Extraction with Selective Attention over Instances, Proceedings of ACL, 2016.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
