Tagged articles
13 articles
Page 1 of 1
Machine Heart
Machine Heart
May 17, 2026 · Artificial Intelligence

ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)

The paper systematically studies Test‑Time Training (TTT) for vision, derives six design principles, and introduces ViT³—a pure TTT architecture that uses full‑batch internal training, a learning rate of 1.0, and lightweight SwiGLU‑Depthwise convolution modules, achieving state‑of‑the‑art linear‑complexity performance across classification, detection, segmentation and generation tasks.

Computer VisionLinear ComplexitySequence Modeling
0 likes · 14 min read
ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)
Machine Heart
Machine Heart
May 7, 2026 · Artificial Intelligence

OrthoReg: Simple Orthogonal Regularization to Eliminate Model Merging Conflicts

The paper introduces OrthoReg, a lightweight orthogonal regularization added during fine‑tuning that provably enforces weight orthogonality, thereby resolving conflicts in model merging and providing a theoretical explanation for the success of task arithmetic.

Deep LearningOrthoRegOrthogonal Regularization
0 likes · 12 min read
OrthoReg: Simple Orthogonal Regularization to Eliminate Model Merging Conflicts
AIWalker
AIWalker
Mar 18, 2026 · Artificial Intelligence

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

The AAAI 2026 paper by Tsinghua’s Huang‑Gao team shows that modeling Vision‑Transformer attention as a Block‑Circulant matrix and computing it with FFT reduces the quadratic complexity to O(N log N), delivering up to seven‑fold real‑world speedups without sacrificing accuracy.

AAAI 2026Circulant MatricesComputer Vision
0 likes · 15 min read
7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms
SuanNi
SuanNi
Feb 26, 2026 · Artificial Intelligence

How BitDance’s 2.6B‑Parameter Model Beats 14B Counterparts with 8.7× Speedup

BitDance’s new multimodal AI model achieves an 8.7‑fold inference acceleration using only 2.6 billion parameters, surpasses 14‑billion‑parameter state‑of‑the‑art architectures in image generation quality, and introduces binary visual tokens, a binary diffusion head, and next‑block diffusion for efficient parallel autoregressive prediction.

Binary TokenizationVision Transformersai
0 likes · 11 min read
How BitDance’s 2.6B‑Parameter Model Beats 14B Counterparts with 8.7× Speedup
AI Frontier Lectures
AI Frontier Lectures
Jul 8, 2025 · Artificial Intelligence

How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer

The LaVin-DiT paper presents a large vision diffusion transformer that integrates a spatio‑temporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient multi‑task generation for images and videos, and details its training via flow‑matching and experimental results.

3D RoPEComputer VisionGenerative Modeling
0 likes · 12 min read
How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer
AIWalker
AIWalker
Jun 24, 2025 · Artificial Intelligence

Mamba-Adaptor Merges Adaptor‑T and Adaptor‑S to Revolutionize Vision Tasks with State‑of‑the‑Art Benchmarks

The paper introduces Mamba-Adaptor, a plug‑and‑play module combining Adaptor‑T and Adaptor‑S to overcome causal computation, long‑range forgetting, and spatial modeling limits of visual Mamba models, delivering top‑ranked results on ImageNet and COCO across multiple downstream tasks.

AdaptorMambaState Space Model
0 likes · 25 min read
Mamba-Adaptor Merges Adaptor‑T and Adaptor‑S to Revolutionize Vision Tasks with State‑of‑the‑Art Benchmarks
AI Frontier Lectures
AI Frontier Lectures
Jun 14, 2025 · Industry Insights

CVPR 2025 Awards Unveiled: Breakthrough Papers and Rising Stars

The CVPR 2025 awards spotlight groundbreaking research, honoring young scholars and top papers such as VGGT, Neural Inverse Rendering, and several honorable mentions, while summarizing each work's core contributions, methodologies, and potential impact on computer vision and related fields.

2025CVPRComputer Vision
0 likes · 13 min read
CVPR 2025 Awards Unveiled: Breakthrough Papers and Rising Stars
AI Frontier Lectures
AI Frontier Lectures
Mar 14, 2025 · Artificial Intelligence

Do Vision Models Really Need Mamba? A Deep Dive into MambaOut

This article critically examines the MambaOut paper, analyzing whether state‑space‑based Mamba token mixers are necessary for vision tasks, presenting two hypotheses, describing the construction of MambaOut models without SSM, and reporting extensive ImageNet, COCO and ADE20K experiments that reveal when Mamba is beneficial.

Deep LearningMambaState Space Model
0 likes · 17 min read
Do Vision Models Really Need Mamba? A Deep Dive into MambaOut
AIWalker
AIWalker
Feb 26, 2025 · Artificial Intelligence

Why Linear Attention Lags Behind Softmax and How Two Simple Tweaks Close the Gap

The paper analytically identifies injectivity and local modeling as the two key factors causing the performance gap between linear and Softmax attention, proposes the InLine attention modifications to restore these properties, and demonstrates through extensive Vision Transformer experiments that the enhanced linear attention matches or surpasses Softmax while retaining linear computational cost.

Attention MechanismEfficient TransformersLinear Attention
0 likes · 24 min read
Why Linear Attention Lags Behind Softmax and How Two Simple Tweaks Close the Gap
Laiye Technology Team
Laiye Technology Team
Apr 1, 2022 · Artificial Intelligence

Self‑Supervised Learning: Contrastive Methods and the MoCo Series (V1‑V3)

This article introduces the four types of machine learning, explains self‑supervised learning, details generative and contrastive approaches, and provides an in‑depth overview of the MoCo series (V1‑V3), including their architectures, training strategies, and experimental results on document image classification and text‑line detection tasks.

MoCoVision Transformerscontrastive learning
0 likes · 12 min read
Self‑Supervised Learning: Contrastive Methods and the MoCo Series (V1‑V3)
Meituan Technology Team
Meituan Technology Team
Mar 24, 2022 · Artificial Intelligence

Twins: Efficient Visual Attention Models for Vision Transformers

The Twins series, a collaboration between Meituan and the University of Adelaide, introduces conditional positional encoding and spatially separable self‑attention to improve efficiency and performance of vision transformers, achieving state‑of‑the‑art results on ImageNet, ADE20K, COCO and high‑precision map segmentation.

ADE20KCOCOConditional Positional Encoding
0 likes · 20 min read
Twins: Efficient Visual Attention Models for Vision Transformers
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 28, 2022 · Artificial Intelligence

How Masked Autoencoders Revolutionize Vision Pre‑Training: A Deep Dive

This article provides a detailed technical walkthrough of Masked Autoencoders (MAE) for computer vision, covering its BERT‑inspired masking strategy, asymmetric encoder‑decoder design, implementation specifics, experimental findings on mask ratios and decoder depth, and the resulting performance gains over supervised ViT models.

Computer VisionMAEMasked Modeling
0 likes · 11 min read
How Masked Autoencoders Revolutionize Vision Pre‑Training: A Deep Dive
Code DAO
Code DAO
Dec 8, 2021 · Artificial Intelligence

Understanding Compact Transformers: Build and Train Vision & NLP Models on a Personal PC

This article walks through the design of Compact Transformers, explaining scaled dot‑product self‑attention, positional embeddings, multi‑head attention, and Vision Transformer architecture, and provides full PyTorch code so readers can train lightweight CV and NLP classifiers on a single PC.

Compact TransformersPatch EmbeddingPositional Embedding
0 likes · 19 min read
Understanding Compact Transformers: Build and Train Vision & NLP Models on a Personal PC