Artificial Intelligence 11 min read

From RankNet to Boosted Decision Trees: Evolution of Bing’s Search Ranking Algorithms

Chris Burges recounts Microsoft’s transition from the early “Flying Dutchman” system to RankNet and finally to Boosted Decision Trees, explaining how fast experimentation, LambdaRank/MART innovations, and large‑scale data handling have dramatically improved Bing’s search ranking accuracy and efficiency.

Architects Research Society
Architects Research Society
Architects Research Society
From RankNet to Boosted Decision Trees: Evolution of Bing’s Search Ranking Algorithms

Original author: Chris Burges, Chief Research Manager at Microsoft Redmond Research Lab.

Translator: Chen Bin.

Hello, I am Chris Burges. Over my 14‑year tenure at Microsoft and a prior 14‑year research career at Bell Labs, I have worked on machine‑learning research and tackled industry‑relevant problems. With the recent surge of interest in machine learning and its applications, it is an excellent time to explore both practical and algorithmic aspects of how machine learning operates.

In 2004, Microsoft Research teamed up with the Microsoft Web Search group to improve search relevance. We initially used a system called “The Flying Dutchman.” A few months later we built a new system, RankNet, a simple neural‑network ranker that could generate a ranking model in about an hour on a single machine, compared with the days and large clusters required by the previous system.

Science, research, algorithm design, and product development will increasingly intersect in the coming years. This article first presents how these elements interact—even for readers with no prior machine‑learning knowledge—highlighting the importance of rapid experimentation. When a good idea emerges, researchers want to test it immediately; even if an early model is not as accurate as existing ones, faster training and testing can accelerate overall progress and eventually surpass current performance.

Recently, a family of models called Boosted Decision Trees (BDTs) has become especially popular. BDTs can flexibly address various prediction tasks, such as:

Ranking: placing the most relevant web‑search results at the top;

Classification: determining whether an email is spam;

Regression: predicting the selling price of a house.

BDTs are widely used within Microsoft’s machine‑learning services—over 670,000 training runs in the past year alone. Although this figure is inflated by model‑selection experiments, it indicates a substantial scale. The popularity is not limited to Microsoft: in Yahoo’s 2010 “Learning to Rank” challenge, more than 1,000 teams entered, and the top five systems all employed decision‑tree ensembles (Microsoft’s winning entry combined BDTs with neural networks). Thus, BDTs are a solid choice for many prediction problems.

Let’s use web‑search ranking as a typical research‑to‑product development case. The biggest research challenges are formulating the right questions and validating ideas. Starting from real‑world problems is an effective approach.

When you submit a query to Bing, we efficiently scan all indexed documents. Fast filters discard many candidates (e.g., documents with no overlapping keywords). For each remaining candidate we generate thousands of features describing relevance to the query—simple ones like “does the title contain query terms?” and more advanced ones like “does the document mention any query entities?”. The ranking model maps this feature list to a relevance score. Historically we used a single metric, NDCG, to evaluate result quality; now multiple metrics assess user satisfaction. NDCG values range from 0 to 1, with 1 indicating a perfect ranking on a labeled dataset.

How did we evolve from RankNet to BDTs? Although RankNet was a breakthrough, it did not align well with our task because it optimized pairwise ordering without directly considering NDCG. Directly optimizing NDCG is mathematically difficult because the metric changes discontinuously with the model’s scores. We realized that for neural‑network training we only need a gradient, not the exact function value. By treating the change in NDCG as a “force” that nudges documents up or down in the ranking, we derived gradients for each document pair. This idea led to LambdaRank, which improves upon RankNet, and later to LambdaMART, which applies the same principle to tree‑based models, giving BDTs two key advantages:

1. More natural handling of a wide range of features.

2. Faster training time, accelerating the experimental turnaround.

Subsequently, a team led by Ofer Dekel made BDT training almost two orders of magnitude faster than neural‑network training and capable of handling much larger datasets.

In short, our love for BDTs stems from a virtuous cycle where engineering and product needs drive research, and research creates new product opportunities. For the later two stages—RankNet and BDTs—the main contribution has been the ability to process larger data volumes more quickly. Although this article focuses on ranking, the techniques are not limited to a niche problem; they have been applied throughout Bing to substantially improve search quality.

artificial intelligenceMachine LearningSearch Engineranking algorithmsLambdaMARTBoosted Decision TreesRankNet
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.