Artificial Intelligence 7 min read

AutoX: One-Click Tabular AutoML from Feature Engineering to Model Fusion

AutoX offers a one‑click solution for tabular AutoML by defining feature operators, constructing a searchable feature space, applying efficient feature selection, tuning hyper‑parameters, and performing model ensembling, enabling users with limited ML expertise to automatically generate high‑performing predictive models, as demonstrated on multiple Kaggle datasets.

Baobao Algorithm Notes

Dec 10, 2021

AutoX: One-Click Tabular AutoML from Feature Engineering to Model Fusion

Principles of Tabular AutoML

Tabular AutoML differs from neural architecture search by focusing on feature engineering and pipeline optimization. The workflow consists of:

Defining a set of preprocessing operators (e.g., groupby, value_count, bag_of_words) and feature operators.

Constructing a searchable feature space that specifies which operators can be applied to which columns.

Applying an efficient feature‑selection algorithm to solve the NP‑hard subset‑selection problem and retain a high‑quality feature subset.

Running hyper‑parameter optimization, model training, and model ensembling to improve predictive performance.

Key research questions include fast and reliable evaluation of feature subsets and the choice of search strategy (BFS, DFS, beam search, heuristic methods, or reinforcement learning).

Four‑Stage AutoML Pipeline

Feature derivation – converting continuous values to categorical bins, aggregating statistics, creating interaction features, and encoding cross‑features.

Feature selection – using the feature‑selection algorithm to prune the large operator space.

Parameter selection – applying mature hyper‑parameter optimization techniques.

Model fusion – ensembling multiple trained models to boost accuracy.

Example Prediction Task

Given a table with columns such as age , income , purchase_count , total_spend , region , and occupation , the goal is to predict a binary label indicating credit‑card repayment ability. The system automatically generates derived features (e.g., region‑average income, spend per transaction, region‑income level encoding) and selects the most predictive subset.

AutoX Library

AutoX provides a scikit‑learn‑compatible API that encapsulates the entire workflow. A typical usage pattern is:

autox = AutoX(
    target='qty',
    train_name='train.csv',
    test_name='test.csv',
    id=['unit'],
    path=path,
    time_series=True,
    ts_unit='D',
    time_col='ts',
    relations=relations
)
sub = autox.get_submit_ts()

The call performs data cleaning, feature engineering, hyper‑parameter tuning, and model ensembling automatically.

Performance Evaluation

Benchmarks on several Kaggle datasets show that AutoX consistently outperforms AutoGluon and H2O in both classification and regression tasks. Detailed results are available in the project repository.

Key Technical Features

High predictive performance on diverse tabular benchmarks.

Simple scikit‑learn‑style interface for rapid prototyping.

Support for both classification and regression, including time‑series forecasting.

Fully automated pipeline: no manual data cleaning, feature engineering, or hyper‑parameter tuning required.

Modular design allows expert users to replace or augment individual components.

Repository

Source code, benchmarks, and release artifacts are hosted at:

https://github.com/4paradigm/AutoX

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python AutoML tabular data AutoX Machine Learning Automation

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.