How Alibaba Engineers Boost SEO with Reinforcement Learning and Attention Models

This article details Alibaba.com engineers' application of reinforcement learning, attention mechanisms, and weakly supervised techniques to extract product summaries, improve content quality, and significantly raise SEO rankings, supported by offline experiments, online A/B testing, and future research directions.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Engineers Boost SEO with Reinforcement Learning and Attention Models

Background

SEO is a set of techniques that follow search‑engine guidelines to improve website rankings and attract more traffic. Traditional methods such as TDK optimization, link building, and mobile‑first design have been extensively applied at alibaba.com. As search‑engine algorithms increasingly prioritize genuine content value, pages with richer, high‑quality text receive higher rankings.

Figure 1 shows the importance ranking of SEO factors, highlighting content construction as a major contributor.

Our work focuses on content construction, specifically extracting product summary sentences to enrich the SEO landing‑list pages. Adding concise product descriptions increases page text, improves Google rankings, attracts users, and boosts click‑through rates.

Problem Description

Given a product and its description, we aim to extract suitable sentences as a product summary. Two main challenges arise: (1) lack of strictly labeled data to guide the model, and (2) noisy descriptions containing logistics, payment, and Q&A information that obscure the core product content.

Algorithm Exploration

Textrank Model

We treat summary extraction as an unsupervised task and use the classic Textrank algorithm as a baseline. Textrank computes semantic similarity between sentences, assigns a Textrank score, and selects the top‑N sentences. The similarity formula is shown below.

Manual review revealed that Textrank often selects noisy sentences (e.g., payment instructions) because the algorithm cannot distinguish category‑relevant content from unrelated information.

Attention Model

To address the unsupervised challenge, we introduce product‑category tags, converting the problem into weak supervision. A text‑classification model with an attention mechanism assigns higher weights to sentences that are more category‑relevant, making them better candidates for summaries.

The attention mechanism computes a similarity α between each sentence vector and a global category vector U, producing attention weights that guide sentence selection.

Although the attention model reduces some noise, it still cannot dynamically adjust the number of extracted sentences, and its performance gain over Textrank is modest.

Reinforcement Learning Model

We adopt a Selector‑Classifier architecture. The Selector Network chooses candidate summary sentences, while the Classifier Network evaluates them and returns a reward based on classification loss. The three networks—Encoder, Selector, and Classifier—are trained jointly.

The Encoder extracts features for each sentence (Vec1‑Vec4). The Selector outputs a probability that a sentence is category‑relevant. After selection, the Classifier computes cross‑entropy loss for category prediction and feeds it back as a reward to the Selector. If the Selector rejects all sentences, the average classification loss on the training set is used as the reward.

Experimental Analysis

We conducted two offline experiments.

Experiment 1: Noise Reduction

We replaced the original noisy dataset with summaries generated by each model and trained a text‑CNN classifier for category prediction. Results are compared with the Textrank baseline.

The full‑noise dataset yields only ~47.5% accuracy, while the reinforcement‑learning model reaches ~80%, demonstrating effective noise removal. The attention model slightly outperforms Textrank (+4% accuracy).

Experiment 2: Supervised Evaluation

We manually labeled 1,000 items and evaluated all algorithms.

The reinforcement‑learning model shows clear gains in Precision and F1, while Recall is slightly lower because it extracts fewer sentences (≈30% fewer) than the baselines.

Further Attention Analysis

Document‑level category classification tests reveal that the current attention mechanism does not provide significant improvements.

Future work may explore selective attention to better suit the large number of categories.

Online Results

A one‑month online A/B test shows a stable increase in UV after deploying the reinforcement‑learning summaries.

Outlook

Current models already demonstrate strong denoising capability and produce readable summaries with minimal noise. Combining this approach with seq2seq generation could create high‑quality English training data for product recommendation scenarios.

References

[1] Feng J, Huang M, Zhao L, et al. Reinforcement Learning for Relation Classification from Noisy Data, Proceedings of AAAI, 2018.

[2] Yang Z, Yang D, Dyer C, et al. Hierarchical Attention Networks for Document Classification, Proceedings of NAACL, 2016.

[3] Lin Y, Shen S, Liu Z, et al. Neural Relation Extraction with Selective Attention over Instances, Proceedings of ACL, 2016.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Alibabamachine learningreinforcement learningSEOattention modeltext summarization
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.