AAAI‑2024 Highlights: Alibaba Cloud’s Deep Tabular Learning & Multi‑Modal Fusion

Alibaba Cloud’s AI platform PAI showcased four cutting‑edge papers at AAAI‑2024—introducing AMFormer for deep tabular learning via arithmetic feature interaction, MuLTI for efficient video‑language understanding, M2SD for few‑shot class‑incremental learning, and M2Doc for multi‑modal document layout analysis—demonstrating the platform’s growing impact on artificial‑intelligence research.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
AAAI‑2024 Highlights: Alibaba Cloud’s Deep Tabular Learning & Multi‑Modal Fusion

Recent papers from Alibaba Cloud’s AI platform PAI were accepted at AAAI‑2024, one of the most prestigious international conferences in artificial intelligence, highlighting the platform’s advances in fundamental and applied AI research.

Unlocking Deep Tabular Learning (AMFormer)

The authors identify arithmetic feature interaction as a crucial inductive bias for deep tabular learning. By embedding this bias into a Transformer‑based architecture called AMFormer, they achieve superior modeling accuracy, data‑efficiency, and generalization on both synthetic and real‑world tabular datasets.

AMFormer architecture
AMFormer architecture

MuLTI: Efficient Video‑Language Understanding

MuLTI addresses the high computational cost of multimodal video‑language models by introducing a Text‑Guided MultiWay‑Sampler and a multiple‑choice modeling pre‑training task. These innovations reduce GPU memory usage while preserving performance, achieving state‑of‑the‑art results on several video‑question‑answering and retrieval benchmarks.

MuLTI model diagram
MuLTI model diagram

M2SD: Multiple Mixing Self‑Distillation for Few‑Shot Class‑Incremental Learning

M2SD proposes a dual‑branch architecture with virtual classes and a multiple‑mixing self‑distillation strategy to expand the feature space for new categories while preserving knowledge of old ones. Extensive experiments on few‑shot class‑incremental benchmarks demonstrate significant improvements in accuracy and robustness.

M2SD framework
M2SD framework

M2Doc: Plug‑in Multi‑Modal Fusion for Document Layout Analysis

M2Doc introduces early‑fusion and late‑fusion modules that combine visual and textual features within existing object detectors. This plug‑in design yields consistent performance gains on document layout benchmarks such as DocLayNet and M6Doc, achieving state‑of‑the‑art results when combined with detectors like DINO.

M2Doc fusion architecture
M2Doc fusion architecture
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

multimodal AIdeep learningFew‑Shot Learningdocument-analysistabular datavideo-language
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.