Artificial Intelligence 14 min read

Exploring Interactive BERT for Relevance in Health E‑commerce Search

This article presents a comprehensive overview of Alibaba Health's interactive BERT approach for improving relevance in health e‑commerce search, covering business background, model design, domain‑specific data construction, knowledge‑distilled twin‑tower deployment, experimental results, and a detailed Q&A session.

DataFunTalk
DataFunTalk
DataFunTalk
Exploring Interactive BERT for Relevance in Health E‑commerce Search

The presentation begins with an introduction to Alibaba Health's e‑commerce search platform, emphasizing that relevance sits in the middle of the search pipeline and must balance high recall with precise ranking for medical products.

It then outlines the three main parts of the talk: an overview of the health search business and technology, exploration of the interactive BERT algorithm, and practical model deployment.

Health Search Business and Technology – The health e‑commerce channel primarily runs on Taobao, covering medicines, medical devices, and supplements. Accurate relevance is critical because user queries contain professional medical terminology that requires domain‑aware understanding.

Interactive BERT Algorithm Exploration – The need for a deep semantic model is motivated by the limitations of pure text and entity matching. Three challenges are identified: (1) high domain specificity, (2) scarcity of high‑quality labeled samples, and (3) strict online latency requirements. The solution proposes three optimization directions: enhancing key‑attribute understanding, constructing high‑quality samples, and strengthening dual‑tower semantic representation.

The model architecture adds a keyword embedding to indicate whether a token is a key attribute, multiplies it with the output of the 11‑layer BERT, and feeds the result into an additional transformer layer before concatenating with the original 12‑layer output for final classification. Pre‑training is adapted to predict click likelihood, achieving an AUC of 0.927 on a held‑out set.

Domain‑Specific Sample Generation – Positive samples are derived from high‑click queries and products, while negative samples come from exposed but unclicked items, filtered by text and entity scores. Additional domain‑knowledge samples are created using an entity dictionary and relational triples, resulting in over two billion training instances.

Knowledge Distillation to Twin‑Tower Model – Because the full interactive BERT is too large for online serving, a multi‑granularity twin‑tower model is distilled using the 20 billion domain samples. The twin‑tower processes single‑character and bi‑character token combinations, aggregates embeddings via max and average pooling, and applies interaction operations (addition, subtraction, max‑pool) before an MLP. The distilled model reduces latency to under 10 ms while losing only 0.6 AUC points, and online A/B tests show improvements of 2.6 percentage points in manual relevance scores and over 3 percentage points in order conversion.

The session concludes with a Q&A covering long‑tail item handling, relevance standards, sample bias, negative‑sample construction, and technical details such as hash‑based bigram embeddings and the benefits of unigram‑bigram fusion.

AIknowledge distillationBERTSearch RelevanceSemantic Modelinghealth e-commerce
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.