Artificial Intelligence 16 min read

DR-BERT: Enhancing BERT-based Document Ranking with Task-adaptive Training and OOV Matching

DR‑BERT boosts BERT‑based document ranking on the MS MARCO benchmark by applying domain‑adaptive pretraining, a two‑stage fine‑tuning pipeline (pointwise then listwise), and OOV‑aware mechanisms—including exact‑match features and word‑recovery of sub‑tokens—achieving the first MRR@10 above 0.4 and leading the leaderboard.

Meituan Technology Team

Aug 20, 2020

DR-BERT: Enhancing BERT-based Document Ranking with Task-adaptive Training and OOV Matching

The MS MARCO dataset, built from real-world Bing and Cortana queries, provides a large-scale benchmark for document ranking and retrieval tasks in open-domain question answering. Effective ranking is crucial because direct QA over all candidates is infeasible; ranking models filter top documents for downstream answer generation.

The authors propose DR-BERT, a BERT-based ranking model that achieves the first MRR@10 score above 0.4 on the official MS MARCO leaderboard (May–August 2020). DR-BERT’s core innovations include domain-adaptive pretraining on MARCO corpora, a two-stage fine‑tuning strategy (Pointwise followed by Listwise), and two mechanisms to alleviate out‑of‑vocabulary (OOV) mismatches: an exact‑match feature and a word‑recovery mechanism that merges WordPiece subtoken representations.

In the first fine‑tuning stage, Pointwise training incorporates query‑type awareness by concatenating the query, its type label, and the document before feeding them to BERT. The resulting [CLS] representation is scored via a softmax layer optimized with cross‑entropy loss, allowing the model to learn type‑specific matching patterns.

The second stage applies Listwise fine‑tuning: for each query, sampled positive and negative documents are encoded, their representations are reduced to scores via a single‑layer perceptron, and a listwise loss (derived from pairwise score comparisons and normalization) is optimized using negative log‑likelihood. This stage directly learns inter‑document ranking relations.

To combat OOV errors caused by WordPiece splitting, DR‑ERT adds an exact‑match feature indicating whether a term appears in both query and document, and a word‑recovery mechanism that averages subtoken vectors (with masking of non‑initial positions) to reconstruct whole‑word representations, also providing a dropout‑like regularization effect.

Experiments on MS MARCO show DR‑BERT surpasses prior neural ranking models, maintains the top position on the leaderboard for several months, and validates the effectiveness of domain‑adaptive pretraining, staged fine‑tuning, and OOV‑aware features for document retrieval tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

BERT MS MARCO document ranking DR-BERT OOV matching task-adaptive training two-stage fine-tuning

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.