Tagged articles

Vision models

9 articles · Page 1 of 1

May 28, 2026 · Artificial Intelligence

UNSL: A Unified Multivariate Scaling Law for Predicting Large Model Performance

The article explains that traditional neural scaling laws consider only parameters, data, and compute, while real training involves many variables, and introduces the Unified Neural Scaling Law (UNSL) from Mila and DeepMind, which incorporates multivariate interactions, bottlenecks, hyperbreaks, overfitting, and hyper‑parameter effects, showing superior extrapolation on vision and language benchmarks.

DeepMindLanguage ModelsMila

0 likes · 9 min read

UNSL: A Unified Multivariate Scaling Law for Predicting Large Model Performance

Machine Heart

May 8, 2026 · Artificial Intelligence

How an Agentic Loop Turns Text‑to‑3D Scene Generation into an Iterative Planning Process

Scenethesis, a new ICLR 2026 framework from NVIDIA and Purdue, combines language, vision, and physics in a closed‑loop agent to turn one‑shot text‑to‑3D generation into a repeatable plan‑check‑repair workflow, dramatically improving spatial realism and physical plausibility.

Language ModelsMultimodal GenerationVision models

0 likes · 9 min read

How an Agentic Loop Turns Text‑to‑3D Scene Generation into an Iterative Planning Process

ZhongAn Tech Team

Apr 27, 2026 · Artificial Intelligence

The Single‑Agent Era Ends – Kimi K2.6 Scales to 300 Agents for Complex Tasks

This week’s tech roundup covers the launch of Kimi K2.6 with a 300‑agent swarm capability and major performance gains, DeepSeek V4’s new sparse‑attention architecture and pricing, Meshy’s AI‑3D partnership, a $4.55 B AI‑brain funding round, Honor’s record‑breaking robot, M‑Flow’s cone‑graph memory engine, and Vision Banana’s unified visual model, all backed by benchmark data and industry commentary.

3D generationAI agentsAI industry

0 likes · 32 min read

The Single‑Agent Era Ends – Kimi K2.6 Scales to 300 Agents for Complex Tasks

Data Party THU

Apr 5, 2026 · Artificial Intelligence

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Visual and vision-language models excel under IID benchmarks but often fail on out-of-distribution data due to shortcut learning; this article examines the problem, explains its causes, and proposes data-level and model-level interventions—including StillMix, FLASH, and SPARCL—to improve OOD robustness.

AI researchModel DesignOOD generalization

0 likes · 7 min read

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Software Engineering 3.0 Era

Mar 15, 2026 · Artificial Intelligence

When AI ‘Crayfish’ Takes Over Testing, Where Do 80% of Testers Go?

The article demonstrates how an LLM‑powered agent (nicknamed “crayfish”) equipped with OpenClaw and Playwright MCP can autonomously perform web‑testing tasks—handling environment setup, visual OCR, error recovery and reporting—showing a shift from fragile scripted automation to intent‑driven testing and warning that traditional test engineers have little time left to adapt.

AI testingLLM AgentsPlaywright

0 likes · 11 min read

When AI ‘Crayfish’ Takes Over Testing, Where Do 80% of Testers Go?

AI Info Trend

Jan 14, 2026 · Industry Insights

2026 AI Model Leaderboards: Google Dominates, Anthropic Surprises, OpenAI’s New Champion

The 2026 AI model leaderboards across Text, Web Development, Vision, and Text-to-Image arenas reveal Google’s Gemini series leading in text and vision, Anthropic’s Claude Opus unexpectedly topping web‑dev rankings, and OpenAI’s GPT‑Image‑1.5 clinching the top spot in creative image generation, highlighting an increasingly competitive AI landscape.

AIAnthropicGoogle

0 likes · 8 min read

2026 AI Model Leaderboards: Google Dominates, Anthropic Surprises, OpenAI’s New Champion

AI Frontier Lectures

Dec 15, 2025 · Artificial Intelligence

How UnityVideo Unifies Multimodal Training to Boost Video Generation

UnityVideo, a new vision framework from HKUST, CUHK, Tsinghua and Kuaishou, unifies training across depth, flow, pose, segmentation and RGB modalities, achieving faster convergence, higher video quality, zero‑shot generalization and stronger physical reasoning compared with existing single‑modality video generators.

AI researchUnityVideoVision models

0 likes · 15 min read

How UnityVideo Unifies Multimodal Training to Boost Video Generation

HyperAI Super Neural

Dec 12, 2025 · Artificial Intelligence

AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language

The AI Open‑Source Forum brought together researchers from Peking University, Tsinghua, Zilliz and MoonBit to share open‑source advances in audio‑synchronized video generation, vector database architecture, lightweight vision backbones, and an AI‑native programming language, highlighting datasets, system designs, and future collaborative directions.

AIAI‑Native ProgrammingOpen-source

0 likes · 12 min read

AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language

DataFunSummit

Jan 14, 2023 · Artificial Intelligence

Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains

This article surveys the most influential Transformer‑based research papers—from the original Attention Is All You Need work to recent models such as Autoformer and FEDformer—covering breakthroughs in natural language processing, computer vision, speech recognition, and long‑term series forecasting, and provides download links for each.

AILanguage ModelsTime-Series Forecasting

0 likes · 17 min read

Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains