Artificial Intelligence 9 min read

AAAI‑2024 Highlights: Alibaba Cloud’s Deep Tabular Learning & Multi‑Modal Fusion

Alibaba Cloud’s AI platform PAI showcased four cutting‑edge papers at AAAI‑2024—introducing AMFormer for deep tabular learning via arithmetic feature interaction, MuLTI for efficient video‑language understanding, M2SD for few‑shot class‑incremental learning, and M2Doc for multi‑modal document layout analysis—demonstrating the platform’s growing impact on artificial‑intelligence research.

Alibaba Cloud Big Data AI Platform

Mar 12, 2024

AAAI‑2024 Highlights: Alibaba Cloud’s Deep Tabular Learning & Multi‑Modal Fusion

Recent papers from Alibaba Cloud’s AI platform PAI were accepted at AAAI‑2024, one of the most prestigious international conferences in artificial intelligence, highlighting the platform’s advances in fundamental and applied AI research.

Unlocking Deep Tabular Learning (AMFormer)

The authors identify arithmetic feature interaction as a crucial inductive bias for deep tabular learning. By embedding this bias into a Transformer‑based architecture called AMFormer, they achieve superior modeling accuracy, data‑efficiency, and generalization on both synthetic and real‑world tabular datasets.

MuLTI: Efficient Video‑Language Understanding

MuLTI addresses the high computational cost of multimodal video‑language models by introducing a Text‑Guided MultiWay‑Sampler and a multiple‑choice modeling pre‑training task. These innovations reduce GPU memory usage while preserving performance, achieving state‑of‑the‑art results on several video‑question‑answering and retrieval benchmarks.

M2SD: Multiple Mixing Self‑Distillation for Few‑Shot Class‑Incremental Learning

M2SD proposes a dual‑branch architecture with virtual classes and a multiple‑mixing self‑distillation strategy to expand the feature space for new categories while preserving knowledge of old ones. Extensive experiments on few‑shot class‑incremental benchmarks demonstrate significant improvements in accuracy and robustness.

M2Doc: Plug‑in Multi‑Modal Fusion for Document Layout Analysis

M2Doc introduces early‑fusion and late‑fusion modules that combine visual and textual features within existing object detectors. This plug‑in design yields consistent performance gains on document layout benchmarks such as DocLayNet and M6Doc, achieving state‑of‑the‑art results when combined with detectors like DINO.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multimodal AI Deep Learning Few-shot Learning document-analysis tabular data video-language

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.