AutoX: One-Click Tabular AutoML from Feature Engineering to Model Fusion
AutoX offers a one‑click solution for tabular AutoML by defining feature operators, constructing a searchable feature space, applying efficient feature selection, tuning hyper‑parameters, and performing model ensembling, enabling users with limited ML expertise to automatically generate high‑performing predictive models, as demonstrated on multiple Kaggle datasets.
Principles of Tabular AutoML
Tabular AutoML differs from neural architecture search by focusing on feature engineering and pipeline optimization. The workflow consists of:
Defining a set of preprocessing operators (e.g., groupby, value_count, bag_of_words) and feature operators.
Constructing a searchable feature space that specifies which operators can be applied to which columns.
Applying an efficient feature‑selection algorithm to solve the NP‑hard subset‑selection problem and retain a high‑quality feature subset.
Running hyper‑parameter optimization, model training, and model ensembling to improve predictive performance.
Key research questions include fast and reliable evaluation of feature subsets and the choice of search strategy (BFS, DFS, beam search, heuristic methods, or reinforcement learning).
Four‑Stage AutoML Pipeline
Feature derivation – converting continuous values to categorical bins, aggregating statistics, creating interaction features, and encoding cross‑features.
Feature selection – using the feature‑selection algorithm to prune the large operator space.
Parameter selection – applying mature hyper‑parameter optimization techniques.
Model fusion – ensembling multiple trained models to boost accuracy.
Example Prediction Task
Given a table with columns such as age , income , purchase_count , total_spend , region , and occupation , the goal is to predict a binary label indicating credit‑card repayment ability. The system automatically generates derived features (e.g., region‑average income, spend per transaction, region‑income level encoding) and selects the most predictive subset.
AutoX Library
AutoX provides a scikit‑learn‑compatible API that encapsulates the entire workflow. A typical usage pattern is:
autox = AutoX(
target='qty',
train_name='train.csv',
test_name='test.csv',
id=['unit'],
path=path,
time_series=True,
ts_unit='D',
time_col='ts',
relations=relations
)
sub = autox.get_submit_ts()The call performs data cleaning, feature engineering, hyper‑parameter tuning, and model ensembling automatically.
Performance Evaluation
Benchmarks on several Kaggle datasets show that AutoX consistently outperforms AutoGluon and H2O in both classification and regression tasks. Detailed results are available in the project repository.
Key Technical Features
High predictive performance on diverse tabular benchmarks.
Simple scikit‑learn‑style interface for rapid prototyping.
Support for both classification and regression, including time‑series forecasting.
Fully automated pipeline: no manual data cleaning, feature engineering, or hyper‑parameter tuning required.
Modular design allows expert users to replace or augment individual components.
Repository
Source code, benchmarks, and release artifacts are hosted at:
https://github.com/4paradigm/AutoX
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
