Tagged articles

ViT³

10 articles · Page 1 of 1

Jun 12, 2026 · Artificial Intelligence

ViT³ Reaches CVPR 2026 Best‑Paper Finalist Using Test‑Time Training to Break Transformer Complexity

The ViT³ paper, a CVPR 2026 best‑paper finalist, introduces test‑time training to compress visual context, achieving 4.6× faster inference and 90 % lower GPU memory on 1248×1248 images, while outlining six design principles and demonstrating its adaptability to classification, detection, segmentation, and generation tasks.

CVPR 2026Efficient AttentionHigh-Resolution Vision

0 likes · 16 min read

ViT³ Reaches CVPR 2026 Best‑Paper Finalist Using Test‑Time Training to Break Transformer Complexity

Baidu Geek Talk

May 25, 2026 · Artificial Intelligence

Accelerating Multimodal Model Training: LoongForge's DP Load‑Balancing Optimization Explained

The article analyzes how data‑parallel (DP) load imbalance hampers large‑scale multimodal model training, details LoongForge's two‑stage adaptive data‑reallocation method that builds a precise compute‑cost model and dynamically redistributes samples, and presents experimental results showing up to 10% throughput gains on massive DP clusters.

DP load balancingData ParallelLoongForge

0 likes · 16 min read

Accelerating Multimodal Model Training: LoongForge's DP Load‑Balancing Optimization Explained

AIWalker

May 19, 2026 · Artificial Intelligence

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

EUPE introduces a three‑stage “scale‑then‑shrink” distillation pipeline that first trains a large proxy model to absorb heterogeneous expert knowledge and then compresses it into an 86M encoder, achieving state‑of‑the‑art performance on image classification, dense prediction and vision‑language tasks on an iPhone with only 62 ms latency.

EUPEKnowledge DistillationViT³

0 likes · 16 min read

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

xkx's Tech General Store

Apr 16, 2026 · Artificial Intelligence

Understanding Vision Transformers: Core ViT Principles and Multimodal Applications

This article explains the Vision Transformer (ViT) architecture, compares it with CNNs and traditional NLP Transformers, details its encoding process and attention mechanisms, and demonstrates a practical leaf‑disease classification project that showcases ViT’s role in multimodal AI systems.

AI FundamentalsDeep LearningMultimodal AI

0 likes · 10 min read

Understanding Vision Transformers: Core ViT Principles and Multimodal Applications

Data Party THU

Mar 25, 2026 · Artificial Intelligence

How Knowledge‑Guided Context Optimization Boosts Zero‑Shot Vision‑Language Models

The article analyzes the Base‑to‑New generalization problem of CLIP‑based visual‑language models, explains why standard prompt tuning (CoOp) forgets base knowledge, and presents the KgCoOp framework that adds a knowledge‑guided loss to keep learned prompts close to hand‑crafted ones, dramatically improving unseen‑class performance while preserving efficiency.

CLIPKnowledge-guided OptimizationPrompt Tuning

0 likes · 12 min read

How Knowledge‑Guided Context Optimization Boosts Zero‑Shot Vision‑Language Models

DeepHub IMBA

Mar 23, 2026 · Artificial Intelligence

How KgCoOp Uses Knowledge‑Guided Context Optimization to Prevent Prompt Tuning Forgetting

The article analyzes why standard prompt tuning (CoOp) causes catastrophic forgetting in visual‑language models, introduces the KgCoOp framework that adds a knowledge‑guided loss to regularize prompts, and shows through extensive experiments on 11 benchmarks that KgCoOp improves unseen‑class accuracy, harmonic mean, and efficiency while discussing trade‑offs and limitations.

Catastrophic ForgettingKnowledge-guided OptimizationPrompt Tuning

0 likes · 11 min read

How KgCoOp Uses Knowledge‑Guided Context Optimization to Prevent Prompt Tuning Forgetting

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

Diffusion ModelsVAEViT³

0 likes · 13 min read

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

Rare Earth Juejin Tech Community

Jul 12, 2023 · Artificial Intelligence

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

This article provides an in‑depth, English‑language overview of Vision Transformer (ViT), covering its Transformer‑based architecture, patch‑to‑token conversion, token and position embeddings, fine‑tuning strategies such as 2‑D interpolation, experimental results versus CNNs, and the model’s broader significance for multimodal AI research.

Deep LearningFine‑tuningPatch Embedding

0 likes · 25 min read

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

DataFunSummit

Apr 21, 2023 · Artificial Intelligence

Fine‑Tuning a ViT Image Classification Model on a Small Flower Dataset Using ModelScope

This tutorial walks through the complete process of fine‑tuning a Vision Transformer (ViT) model for 14‑class flower image classification on ModelScope, covering dataset preparation, model loading, training configuration, evaluation, and inference with practical code examples.

Deep LearningModelScopePython

0 likes · 14 min read

Fine‑Tuning a ViT Image Classification Model on a Small Flower Dataset Using ModelScope

Rare Earth Juejin Tech Community

Oct 18, 2022 · Artificial Intelligence

Practical Implementation of Vision Transformer (ViT) for Image Classification in PyTorch

This article walks readers through building, training, and evaluating a Vision Transformer (ViT) model for a five‑class flower classification task, providing detailed code snippets, model architecture explanations, training script adjustments, and experimental results that highlight the importance of pre‑trained weights.

Deep LearningPretrained ModelsPyTorch

0 likes · 13 min read

Practical Implementation of Vision Transformer (ViT) for Image Classification in PyTorch