Artificial Intelligence 6 min read

Baidu's Large-Scale Machine Learning Technology: Enabling Trillion-Feature Processing with Minute-Level Model Updates

Baidu's Big Data Machine Learning team, led by Xia Fen, unveiled a suite of five novel algorithms that together allow trillion‑scale feature processing, minute‑level model updates, and up to thousand‑fold efficiency gains in training and inference, dramatically surpassing existing solutions such as Google's billion‑feature systems.

Baidu Tech Salon
Baidu Tech Salon
Baidu Tech Salon
Baidu's Large-Scale Machine Learning Technology: Enabling Trillion-Feature Processing with Minute-Level Model Updates

This article discusses Baidu's breakthrough in large-scale machine learning technology, presented by Xia Fen, the leader of Baidu's Big Data Machine Learning team, at the 48th Baidu Technical Salon.

With over 10 years of experience in machine learning, Xia Fen shared how Baidu developed technology capable of accommodating billions of data features while improving learning efficiency by 1000x, enabling minute-level model updates, and accelerating model training algorithms by 10x.

The presentation addressed four key challenges in large-scale machine learning for advertising data: large feature scale, high feature complexity, high data timeliness, and frequent model training. To tackle these challenges, Baidu developed five innovative algorithms:

1. SA Algorithm: Filters out random click noise by analyzing peaks and valleys in time segments to identify and remove noisy samples.

2. Fea-G Algorithm: A feature selection algorithm that identifies effective features before model training, finding minimal sets containing effective features. Unlike Google's heuristic approach that may cause loss, Baidu's theoretically-guaranteed method can remove many ineffective features without performance degradation.

3. DANOVA Algorithm: The world's first deep feature learning algorithm directly applied to large-scale sparse features, reducing feature learning complexity and improving feature mining efficiency by over 1000x, significantly boosting CTR and CPM.

4. SOA Algorithm: A stable online algorithm that improves model stability, changing the training architecture from batch processing to online learning and saving over 80% of resources while achieving minute-level online learning on big data.

5. Shooting Algorithm: Addresses the imbalanced feature distribution in advertising data by improving iteration direction and step size, achieving 10x faster performance than the industry-standard LBFGS algorithm.

These technologies enable Baidu to build a click-through rate prediction system that accommodates trillion-level feature data, achieves minute-level model updates, supports automatic efficient deep learning, and delivers efficient training—capabilities that surpass Google's current offerings of handling billion features with minute-level updates.

feature engineeringdeep learningCTR predictiononline learningBaiducomputational advertisingDistributed AlgorithmsLarge-Scale Machine Learning
Baidu Tech Salon
Written by

Baidu Tech Salon

Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.