Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges
The article reviews the 2021 AI landscape, highlighting the race for ever‑larger pretrained models, the dominance of Transformers across modalities, the promise and limits of large models, the rise of multimodal systems, regulatory considerations, and the still‑nascent progress in reinforcement learning.
Before the New Year I read Andrew Ng’s "Top AI Stories of 2021"[0], which summarized key advances and issues in AI during 2021; after sharing it on social media I felt the need to add my own industry‑focused observations.
General Pretrained Foundations Achieve Miraculous Results
Data and Model Parameter Scale Competition
Since the breakthrough of BERT with its 100‑million‑parameter pre‑training + fine‑tuning paradigm, well‑funded companies have been racing to increase model size. GPT‑3 pushed parameters to over 170 billion, demonstrating impressive task‑transfer and few‑shot fine‑tuning capabilities. Baidu released a 260 billion‑parameter knowledge‑enhanced model, and Google’s Switch Transformer reached the trillion‑parameter mark. In China, the government‑backed Zhiyuan Institute launched WuDao 2.0 with 1.75 trillion parameters. Leaderboards such as CLUE/SuperCLUE have become battlegrounds for large‑model supremacy.
Why
Scale yields miracles : massive data, huge parameter counts, and advanced training techniques together raise the performance ceiling.
Hope for AI industrialization : the multi‑task transfer ability of large models allows simple fine‑tuning on small datasets, making AI tools usable without deep expertise, akin to an internal combustion engine for various applications.
Only well‑resourced players can build such foundations : building a universal base model requires vast data, compute, and research; consequently, competition intensifies among a few large firms.
Large Models Are Great But
Models above a hundred billion parameters mainly dominate leaderboards : In production, inference latency and cost‑effectiveness prevent direct deployment of such massive models.
Distillation leads to noticeable performance loss : While distilling large models into smaller ones is common, the drop in quality—especially for generative tasks—remains significant (e.g., a 32‑layer transformer distilled to 12 layers loses 3‑4 perplexity points).
Few‑shot fine‑tuning is not universally sufficient : Simple fine‑tuning works for easy scenarios, but complex domains with abundant proprietary data often still require domain‑specific pre‑training and custom tasks.
Transformer Is All You Need
Originating in NLP, the Transformer (via "Attention Is All You Need" and BERT) displaced RNNs across language tasks. In computer vision, Swin Transformer achieved state‑of‑the‑art results on detection and segmentation, while speech research has produced Transformer‑Transducer, Speech Transformer, and Transformer‑TTS, indicating a move toward a unified architecture.
Multimodal Intelligence Dawn
With Transformers excelling in language, vision, and speech, and with massive pre‑training data, multimodal models have made breakthroughs—most notably OpenAI’s DALL‑E, which generates images from natural‑language prompts. Industry trends such as TikTok’s global dominance and the surge of short‑form video in China make multimodal content a primary output, though truly seamless human‑machine multimodal interaction remains limited.
Time to Put Appropriate Constraints on AI
Governments initially encourage innovation with minimal regulation, but as technologies mature, policy interventions become necessary. In 2021, many jurisdictions tightened AI governance, exemplified by China’s Personal Information Protection Law, which restricts the use of facial, voice, and behavioral data. Mobile platforms also increased data‑collection controls, prompting the AI community to explore privacy‑preserving techniques such as federated learning and edge‑cloud inference.
AI + Science Shows Great Potential
AI made significant strides across scientific disciplines in 2021. DeepMind’s AlphaFold solved protein‑structure prediction, marking a milestone for biology. Numerous Chinese startups are applying AI to drug discovery and biotech, reflecting the view that the 21st century is the century of life sciences.
Reinforcement Learning Still Training Its Core Skills
Deep learning (connectionism) has driven rapid AI progress, while reinforcement learning (experience‑based) remains a key avenue toward general intelligence. Despite successes in games, RL still faces challenges such as cold‑start, sparse long‑sequence data, and convergence issues, limiting breakthroughs beyond research benchmarks.
References
"2021 Top AI Stories" – Andrew Ng: https://read.deeplearning.ai/the-batch/issue-123/
GPT‑3: Language Models are Few‑Shot Learners – https://arxiv.org/abs/2005.14165
ERNIE 3.0: Large‑scale Knowledge‑Enhanced Pre‑training – https://arxiv.org/abs/2107.02137
Switch Transformers: Scaling to Trillion Parameter Models – https://arxiv.org/abs/2101.03961
WuDao 2.0 – https://wudaoai.cn/
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
DALL‑E – https://openai.com/blog/dall-e/
Highly accurate protein structure prediction with AlphaFold – https://www.nature.com/articles/s41586-021-03819-2
ICLR‑2021 Reinforcement Learning research overview – https://zhuanlan.zhihu.com/p/412666507
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.