Artificial Intelligence 12 min read

Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges

The article reviews the 2021 AI landscape, highlighting the race for ever‑larger pretrained models, the dominance of Transformers across modalities, the promise and limits of large models, the rise of multimodal systems, regulatory considerations, and the still‑nascent progress in reinforcement learning.

DataFunTalk

Jan 3, 2022

Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges

Before the New Year I read Andrew Ng’s "Top AI Stories of 2021"[0], which summarized key advances and issues in AI during 2021; after sharing it on social media I felt the need to add my own industry‑focused observations.

General Pretrained Foundations Achieve Miraculous Results

Data and Model Parameter Scale Competition

Since the breakthrough of BERT with its 100‑million‑parameter pre‑training + fine‑tuning paradigm, well‑funded companies have been racing to increase model size. GPT‑3 pushed parameters to over 170 billion, demonstrating impressive task‑transfer and few‑shot fine‑tuning capabilities. Baidu released a 260 billion‑parameter knowledge‑enhanced model, and Google’s Switch Transformer reached the trillion‑parameter mark. In China, the government‑backed Zhiyuan Institute launched WuDao 2.0 with 1.75 trillion parameters. Leaderboards such as CLUE/SuperCLUE have become battlegrounds for large‑model supremacy.

Why

Scale yields miracles : massive data, huge parameter counts, and advanced training techniques together raise the performance ceiling.

Hope for AI industrialization : the multi‑task transfer ability of large models allows simple fine‑tuning on small datasets, making AI tools usable without deep expertise, akin to an internal combustion engine for various applications.

Only well‑resourced players can build such foundations : building a universal base model requires vast data, compute, and research; consequently, competition intensifies among a few large firms.

Large Models Are Great But

Models above a hundred billion parameters mainly dominate leaderboards : In production, inference latency and cost‑effectiveness prevent direct deployment of such massive models.

Distillation leads to noticeable performance loss : While distilling large models into smaller ones is common, the drop in quality—especially for generative tasks—remains significant (e.g., a 32‑layer transformer distilled to 12 layers loses 3‑4 perplexity points).

Few‑shot fine‑tuning is not universally sufficient : Simple fine‑tuning works for easy scenarios, but complex domains with abundant proprietary data often still require domain‑specific pre‑training and custom tasks.

Transformer Is All You Need

Originating in NLP, the Transformer (via "Attention Is All You Need" and BERT) displaced RNNs across language tasks. In computer vision, Swin Transformer achieved state‑of‑the‑art results on detection and segmentation, while speech research has produced Transformer‑Transducer, Speech Transformer, and Transformer‑TTS, indicating a move toward a unified architecture.

Multimodal Intelligence Dawn

With Transformers excelling in language, vision, and speech, and with massive pre‑training data, multimodal models have made breakthroughs—most notably OpenAI’s DALL‑E, which generates images from natural‑language prompts. Industry trends such as TikTok’s global dominance and the surge of short‑form video in China make multimodal content a primary output, though truly seamless human‑machine multimodal interaction remains limited.

Time to Put Appropriate Constraints on AI

Governments initially encourage innovation with minimal regulation, but as technologies mature, policy interventions become necessary. In 2021, many jurisdictions tightened AI governance, exemplified by China’s Personal Information Protection Law, which restricts the use of facial, voice, and behavioral data. Mobile platforms also increased data‑collection controls, prompting the AI community to explore privacy‑preserving techniques such as federated learning and edge‑cloud inference.

AI + Science Shows Great Potential

AI made significant strides across scientific disciplines in 2021. DeepMind’s AlphaFold solved protein‑structure prediction, marking a milestone for biology. Numerous Chinese startups are applying AI to drug discovery and biotech, reflecting the view that the 21st century is the century of life sciences.

Reinforcement Learning Still Training Its Core Skills

Deep learning (connectionism) has driven rapid AI progress, while reinforcement learning (experience‑based) remains a key avenue toward general intelligence. Despite successes in games, RL still faces challenges such as cold‑start, sparse long‑sequence data, and convergence issues, limiting breakthroughs beyond research benchmarks.

References

"2021 Top AI Stories" – Andrew Ng: https://read.deeplearning.ai/the-batch/issue-123/

GPT‑3: Language Models are Few‑Shot Learners – https://arxiv.org/abs/2005.14165

ERNIE 3.0: Large‑scale Knowledge‑Enhanced Pre‑training – https://arxiv.org/abs/2107.02137

Switch Transformers: Scaling to Trillion Parameter Models – https://arxiv.org/abs/2101.03961

WuDao 2.0 – https://wudaoai.cn/

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

DALL‑E – https://openai.com/blog/dall-e/

Highly accurate protein structure prediction with AlphaFold – https://www.nature.com/articles/s41586-021-03819-2

ICLR‑2021 Reinforcement Learning research overview – https://zhuanlan.zhihu.com/p/412666507

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Multimodal AI Large Language Models AI industry reinforcement learning Transformers AI Governance

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

General Pretrained Foundations Achieve Miraculous Results

Data and Model Parameter Scale Competition

Why

Large Models Are Great But

Transformer Is All You Need

Multimodal Intelligence Dawn

Time to Put Appropriate Constraints on AI

AI + Science Shows Great Potential

Reinforcement Learning Still Training Its Core Skills

References

DataFunTalk

How this landed with the community

Was this worth your time?

0 Comments

AI + Science Shows Great Potential