Artificial Intelligence 12 min read

Why Arithmetic Feature Interaction Is Key to Deep Tabular Learning

Researchers from Alibaba Cloud AI and Zhejiang University present AMFormer, a Transformer‑based model that incorporates arithmetic feature interaction, demonstrating superior fine‑grained modeling, sample efficiency, and generalization on synthetic and real‑world tabular datasets, establishing a new state‑of‑the‑art in deep tabular learning.

Alibaba Cloud Big Data AI Platform

Mar 15, 2024

Why Arithmetic Feature Interaction Is Key to Deep Tabular Learning

Background

Structured tabular data, stored as tables in databases or data warehouses, is widely used in finance, marketing, medical science, and recommendation systems. It typically contains both numerical and categorical features and suffers from missing values, noise, class imbalance, and a lack of effective inductive bias, making analysis challenging. Traditional tree‑ensemble models (e.g., XGBoost, LightGBM, CatBoost) remain dominant in industry due to their robustness to data quality issues, but their performance heavily depends on the quality of manually engineered features.

With the rise of deep learning, researchers have attempted end‑to‑end modeling to reduce reliance on feature engineering. Existing deep tabular approaches can be grouped into four categories: (1) augmenting traditional models with MLP modules (e.g., Wide&Deep, DeepFM); (2) using deep networks for shape functions in generalized additive models (e.g., NAM, NBM, SIAN); (3) tree‑inspired deep models (e.g., NODE, Net‑DNF); (4) Transformer‑based models (e.g., AutoInt, DCAP, FT‑Transformer). However, deep models still lag behind tree ensembles on tabular data, and the field regards tabular learning as the last stronghold not yet conquered by deep learning.

Necessity of Arithmetic Feature Interaction

The authors argue that the limited success of existing deep tabular methods stems from the absence of an effective inductive bias. They hypothesize that arithmetic feature interaction is essential for deep tabular learning. To validate this, they create a synthetic dataset and compare models with and without arithmetic feature interaction.

Synthetic Dataset Construction

The synthetic dataset contains eight features. The response variable is generated by a sum of a limited number of arithmetic terms. Coefficients and exponents are randomly sampled, and 200,000 instances are generated. The data are split into C equally sized classes, then partitioned into 80% training and 20% testing.

Figure 2 shows the performance comparison on the synthetic dataset, where "+x%" indicates the relative improvement of AMFormer over the baseline Transformer.

Experimental Results

(a) When the number of classes increases from 4 to 512 (fine‑grained modeling), AMFormer achieves higher test accuracy, especially for larger class counts. Compared with XGBoost using raw features, both AMFormer and Transformer outperform XGBoost due to neural networks’ automatic feature learning, and AMFormer consistently outperforms the vanilla Transformer.

(b) With a fixed model size, varying the proportion of training data shows that AMFormer maintains an advantage over the Transformer, particularly under limited training data.

These results collectively confirm the significant role of arithmetic feature interaction in deep tabular learning.

Model Architecture

AMFormer builds on the classic Transformer architecture and introduces an Arithmetic Block to enhance arithmetic feature interaction. Numerical features are projected via a linear layer, while categorical features use embedding tables. The resulting embeddings pass through L sequential layers, each containing parallel additive and multiplicative attention mechanisms that explicitly model arithmetic interactions. Residual connections and feed‑forward networks are retained for gradient flow. Prompt tokens replace the quadratic‑complexity self‑attention, reducing memory usage and improving training efficiency.

Further Experiments

Four real‑world datasets covering binary classification, multi‑class classification, and regression were evaluated. Table 1 lists dataset statistics; Table 2 compares AMFormer (two variants based on AutoInt and FT‑Transformer) against six baselines (XGBoost, NODE, DCN‑V2, DCAP, AutoInt, FT‑Transformer). AMFormer consistently achieves higher accuracy or AUC (up to +1.23% and +4.96%) and lower mean squared error in regression, demonstrating superior robustness and stability.

Conclusion

This work investigates effective inductive bias for deep models on tabular data. By embedding arithmetic feature interaction into a Transformer‑based architecture, AMFormer establishes a strong inductive bias that yields superior fine‑grained modeling, sample efficiency, and generalization on both synthetic and real datasets, setting a new state‑of‑the‑art for deep tabular learning.

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.