Tagged articles
6 articles
Page 1 of 1
Data Party THU
Data Party THU
Apr 5, 2026 · Artificial Intelligence

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Visual and vision-language models excel under IID benchmarks but often fail on out-of-distribution data due to shortcut learning; this article examines the problem, explains its causes, and proposes data-level and model-level interventions—including StillMix, FLASH, and SPARCL—to improve OOD robustness.

AI researchModel DesignOOD generalization
0 likes · 7 min read
How to Beat Shortcut Learning for Better OOD Generalization in Vision Models
AI Info Trend
AI Info Trend
Jan 14, 2026 · Industry Insights

2026 AI Model Leaderboards: Google Dominates, Anthropic Surprises, OpenAI’s New Champion

The 2026 AI model leaderboards across Text, Web Development, Vision, and Text-to-Image arenas reveal Google’s Gemini series leading in text and vision, Anthropic’s Claude Opus unexpectedly topping web‑dev rankings, and OpenAI’s GPT‑Image‑1.5 clinching the top spot in creative image generation, highlighting an increasingly competitive AI landscape.

AIAnthropicGoogle
0 likes · 8 min read
2026 AI Model Leaderboards: Google Dominates, Anthropic Surprises, OpenAI’s New Champion
AI Frontier Lectures
AI Frontier Lectures
Dec 15, 2025 · Artificial Intelligence

How UnityVideo Unifies Multimodal Training to Boost Video Generation

UnityVideo, a new vision framework from HKUST, CUHK, Tsinghua and Kuaishou, unifies training across depth, flow, pose, segmentation and RGB modalities, achieving faster convergence, higher video quality, zero‑shot generalization and stronger physical reasoning compared with existing single‑modality video generators.

AI researchUnityVideomultimodal video generation
0 likes · 15 min read
How UnityVideo Unifies Multimodal Training to Boost Video Generation
HyperAI Super Neural
HyperAI Super Neural
Dec 12, 2025 · Artificial Intelligence

AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language

The AI Open‑Source Forum brought together researchers from Peking University, Tsinghua, Zilliz and MoonBit to share open‑source advances in audio‑synchronized video generation, vector database architecture, lightweight vision backbones, and an AI‑native programming language, highlighting datasets, system designs, and future collaborative directions.

AIAI‑Native ProgrammingVideo Generation
0 likes · 12 min read
AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language
DataFunSummit
DataFunSummit
Jan 14, 2023 · Artificial Intelligence

Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains

This article surveys the most influential Transformer‑based research papers—from the original Attention Is All You Need work to recent models such as Autoformer and FEDformer—covering breakthroughs in natural language processing, computer vision, speech recognition, and long‑term series forecasting, and provides download links for each.

AITime-Series ForecastingTransformer
0 likes · 17 min read
Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains