Can Table Modeling Scale? Rethinking the Tree Model Era Amid Compute Shifts

The article examines how a single NVIDIA H100 GPU delivers roughly 200‑fold more FP16 compute than a 96‑core CPU Hadoop node, explores the "Bitter Lesson" of scaling‑driven AI breakthroughs, and presents large‑scale pretraining experiments that show table and sequence models now exhibit clear scaling laws, challenging the dominance of traditional tree‑based approaches.

Machine Heart
Machine Heart
Machine Heart
Can Table Modeling Scale? Rethinking the Tree Model Era Amid Compute Shifts

A single NVIDIA H100 GPU provides about 200× the FP16 compute of a 96‑core CPU Hadoop instance, highlighting the massive compute advantage of modern GPUs.

This disparity creates a striking contrast: while large language models drive AI productivity across many sectors, high‑value industries such as finance, healthcare, e‑commerce, logistics, and manufacturing still rely heavily on tree‑based models like XGBoost and Random Forest for structured data tasks.

The article invokes Richard Sutton’s “Bitter Lesson” to question whether the balance point between compute, data, and algorithms should be redefined as compute power continues to surge.

It then details the AIforData team’s large‑scale pretraining effort, which leverages a thousand‑GPU cluster to pretrain on billions of structured samples and systematically evaluates the resulting models on downstream tasks.

The key findings are: (1) pretraining models consistently and significantly outperform traditional tree models on industrial‑grade table datasets; (2) table data pretraining exhibits a clear scaling law; (3) behavior‑sequence pretraining also shows a scaling law.

Work 1 introduces KMLP (Kolmogorov‑Arnold Network with gated MLP), a hybrid deep architecture that combines a shallow KAN front‑end feature constructor with a gMLP backbone. On a real‑world credit‑scoring dataset containing 2 billion samples, KMLP’s performance advantage over GBDT grows as data scale increases, confirming its scalability.

KMLP addresses two bottlenecks of traditional GBDT: (a) distributed training inefficiency on massive datasets, and (b) reliance on handcrafted feature engineering, by unifying heterogeneous feature interaction learning.

Work 2 presents the FOUND (Transferable and Forecastable User Targeting Foundation Model) framework for user behavior sequence modeling. It tackles (i) weak cross‑domain/generalization and (ii) limited predictive power by integrating multi‑scene user data and using contrastive pretraining to align sequence embeddings with semantically derived textual descriptions. The resulting user representations improve benchmarks and deliver gains across more than 50 business scenarios.

Investigating the scaling law for input sequence length and user count, the authors observe near‑linear performance gains when these dimensions are modest, but a clear bottleneck emerges as they grow large. To mitigate this, they compress user sequences with an RQ‑VAE scheme, observing a “Densing Law” where increased information density delays the scaling bottleneck.

The core conclusion is that scaling phenomena are evident for both table and sequence data at smaller scales, but bottlenecks appear at larger scales; compressing inputs to raise information density can break these limits, demonstrating a Densing Law.

Overall, the AIforData team’s results affirm that scaling laws are extending from NLP and CV into structured data modeling, suggesting the era of hand‑crafted features and tightly tuned tree models may be ending.

Scaling LawpretrainingStructured Datatree modelsFOUNDKMLP
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.