Artificial Intelligence 15 min read

AutoML Overview: Hyperparameter Optimization, Automatic Feature Engineering, and Neural Architecture Search on Alibaba PAI

This article introduces AutoML, explaining how it automates data cleaning, feature engineering, model selection, hyper‑parameter optimization, and neural architecture search, and showcases Alibaba PAI's implementations of HPO, AutoFE, and NAS with practical case studies and performance results.

DataFunSummit
DataFunSummit
DataFunSummit
AutoML Overview: Hyperparameter Optimization, Automatic Feature Engineering, and Neural Architecture Search on Alibaba PAI

Introduction – In the machine‑learning industry, about 80% of effort is spent on data cleaning and feature engineering while only 20% is on model building. AutoML aims to lower the entry barrier by automating these steps.

What is AutoML? AutoML covers five key components:

Automated Machine‑Learning (AutoML) basics

Hyperparameter Optimization (HPO)

Automatic Feature Engineering (AutoFE)

Neural Architecture Search (NAS)

Other related techniques

Typical Machine‑Learning Workflow includes data collection, preprocessing, feature engineering, model selection, hyper‑parameter tuning, model compression, and evaluation/deployment. Each stage can be optimized automatically.

HPO – The goal is to find the best hyper‑parameter configuration within a large search space. Strategies include exhaustive search, random search, Bayesian optimization, probability‑model‑based methods, and sampling algorithms that prune low‑quality configurations early, often combined with early‑stop techniques to save training cost.

AutoFE – Focuses on generating and selecting useful features from tabular data. It defines a search space of unary and binary operations (discretization, normalization, arithmetic, statistical aggregates, logical combinations) and employs search strategies such as FeatureTools/DFS and PAI‑AutoFE (SAFE) to discover high‑value features without writing code.

NAS – Searches for optimal neural network structures. It defines a search space (DAGs, blocks, meta‑structures) and uses algorithms like reinforcement learning, evolutionary methods, gradient‑based search, and Bayesian optimization. Efficient variants (ENAS, DARTS, One‑Shot, MaE‑NAS) reduce training cost by weight sharing or zero‑shot evaluation.

Case Studies – The article presents several real‑world experiments on Alibaba PAI, including HPO improving tree‑based models, AutoFE boosting CTR prediction, and NAS delivering compact, high‑performance models for video recommendation and e‑commerce search.

Conclusion – AutoML integrates HPO, AutoFE, and NAS to streamline the end‑to‑end machine‑learning pipeline, offering significant performance gains and resource savings across various business scenarios.

machine learningfeature engineeringAutoMLNeural Architecture SearchHyperparameter OptimizationAlibaba PAI
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.