Tagged articles

computer vision

667 articles · Page 1 of 7

Jul 4, 2026 · Artificial Intelligence

ICRDrag: The First In‑Context Region Drag Model for Precise, Controllable Image Editing

ICRDrag, presented at ECCV 2026, introduces an in‑context region‑dragging framework that uses mask‑based attention and bidirectional source‑target constraints to achieve precise, natural image edits while overcoming the deformation and boundary issues of earlier point‑ and region‑drag methods.

DiTICRDragcomputer vision

0 likes · 6 min read

ICRDrag: The First In‑Context Region Drag Model for Precise, Controllable Image Editing

Machine Heart

Jun 25, 2026 · Artificial Intelligence

No‑Training Camera Redirection: From One Monocular Video to Arbitrary Angles and Bullet‑Time

FreeOrbit4D achieves training‑free arbitrary camera redirection for a single monocular video by reconstructing a foreground‑complete 4D geometry, delivering stable large‑angle shots, beating baselines on VBench and user studies, and exposing an editable 4D point cloud for many downstream applications.

4D reconstructionFreeOrbit4Dcamera redirection

0 likes · 11 min read

No‑Training Camera Redirection: From One Monocular Video to Arbitrary Angles and Bullet‑Time

Kuaishou Tech

Jun 18, 2026 · Artificial Intelligence

Kuaishou Tech Team Highlights Multiple ICML 2026 Papers Across AI Domains

The Kuaishou technology team reports that several of its papers were accepted at the prestigious ICML 2026 conference—including a spotlight paper on metaphor video understanding, works on causal discovery for irregular time series, image super‑resolution, large‑scale notification dispatch, full‑order ranking, phase‑aware MoE for RL, end‑to‑end e‑commerce search, spatial‑reasoning rewards, a unified SWE benchmark, video temporal grounding, and interpretable transformers—while also inviting attendees to visit their booth B101 in Seoul.

Agentic AIICML 2026Kuaishou

0 likes · 18 min read

Kuaishou Tech Team Highlights Multiple ICML 2026 Papers Across AI Domains

HyperAI Super Neural

Jun 17, 2026 · Artificial Intelligence

Deterministic Video Depth (DVD): Open‑Source Framework Achieves Zero‑Shot SOTA

The DVD framework converts a pretrained video diffusion model into a deterministic, single‑pass video depth estimator, eliminating random sampling artifacts, preserving geometric and semantic priors, and reaching zero‑shot state‑of‑the‑art performance with 163× less training data.

HKUSTcomputer visiondeterministic inference

0 likes · 5 min read

Deterministic Video Depth (DVD): Open‑Source Framework Achieves Zero‑Shot SOTA

Data Party THU

Jun 16, 2026 · Artificial Intelligence

How a T‑Shaped Outfit Evades Both Visible‑Light and Thermal Detectors – Tsinghua’s New Multimodal Adversarial Method

Tsinghua researchers propose a non‑overlapping RGB‑T adversarial clothing that uses printable fabric for visible‑light patterns and aluminum film for thermal patterns, achieving over 90% attack success in digital simulations and about 60% success in real‑world tests across multiple fusion detectors.

3D modelingRGB-Tadversarial attack

0 likes · 9 min read

How a T‑Shaped Outfit Evades Both Visible‑Light and Thermal Detectors – Tsinghua’s New Multimodal Adversarial Method

Machine Heart

Jun 13, 2026 · Artificial Intelligence

World Labs Unveils Three 3D Generation Papers While Co‑Founder Announces Departure

World Labs released three technically detailed papers—World Tracing, Modality Forcing, and Flex4DHuman—each extending 2D diffusion models to 3D generation, while co‑founder Christoph Lassner announced his departure due to injury, marking a notable milestone for the spatial‑AI startup.

3D generationDiffusion ModelsWorld Labs

0 likes · 14 min read

World Labs Unveils Three 3D Generation Papers While Co‑Founder Announces Departure

Machine Heart

Jun 10, 2026 · Artificial Intelligence

DRDD: Turning Diffusion Noise into a Domain Harmonizer for Image Translation

The paper introduces Decoupled Residual Denoising Diffusion (DRDD), which reinterprets Gaussian noise as a domain harmonizer and separates residual removal from denoising, enabling more data‑efficient, multi‑task image‑to‑image translation and achieving state‑of‑the‑art results on benchmarks such as All‑in‑One‑5 with limited paired data.

DRDDData EfficiencyDiffusion Models

0 likes · 14 min read

DRDD: Turning Diffusion Noise into a Domain Harmonizer for Image Translation

Machine Learning Algorithms & Natural Language Processing

Jun 6, 2026 · Artificial Intelligence

Two Undergraduates Earn Best Student Paper Nomination at CVPR 2026

At CVPR 2026, two undergraduate researchers from Guangdong University of Technology secured a Best Student Paper nomination for their ChordEdit work, which introduces a low‑energy optimal‑transport framework for one‑step image editing and outperforms existing methods in speed, memory usage, and user preference.

Best Student PaperCVPR 2026ChordEdit

0 likes · 13 min read

Two Undergraduates Earn Best Student Paper Nomination at CVPR 2026

Machine Heart

Jun 6, 2026 · Artificial Intelligence

Undergrad Wins CVPR Best Student Paper Nomination Using an Old NVIDIA Titan GPU

The CVPR 2026 award list highlighted a paper titled “ChordEdit: One-Step Low-Energy Transport for Image Editing,” authored primarily by a third‑year undergraduate who used an older NVIDIA Titan GPU to achieve model‑agnostic, training‑free, high‑fidelity one‑step image editing with minimal compute, earning an oral presentation slot and a Best Student Paper nomination.

CVPR 2026computer visionimage editing

0 likes · 7 min read

Undergrad Wins CVPR Best Student Paper Nomination Using an Old NVIDIA Titan GPU

Machine Heart

Jun 5, 2026 · Industry Insights

ResNet and YOLO Win Time-Tested Awards at CVPR 2026 – Full Award Breakdown

CVPR 2026 received 16,092 submissions with a 25.3% acceptance rate, announced a record‑high paper count, and presented detailed award analyses—including the Longuet‑Higgins Prize for ResNet and YOLO, best paper breakthroughs in dynamic 4D reconstruction, 3D object generation, and generalist gaming agents, as well as student and young researcher honors.

Award AnalysisCVPR 2026Longuet-Higgins Prize

0 likes · 12 min read

ResNet and YOLO Win Time-Tested Awards at CVPR 2026 – Full Award Breakdown

Huolala Tech

Jun 3, 2026 · Artificial Intelligence

Three Breakthroughs Driving the Rapid Rise of Computer Vision

The article reviews three major recent breakthroughs in computer vision—self‑supervised visual foundation models, feed‑forward 3D reconstruction, and unified multimodal models—detailing their underlying methods, key papers, performance characteristics, and practical implications for real‑world AI applications.

3D reconstructioncomputer visionmultimodal models

0 likes · 22 min read

Three Breakthroughs Driving the Rapid Rise of Computer Vision

Machine Heart

May 31, 2026 · Artificial Intelligence

How a Child’s Finger‑Drawn Moustache Fooled AI Age Verification (and Made Engineers Speechless)

When Discord switched to a teen‑by‑default policy, users discovered that a simple thumb sketch with two eyes and a mouth could trick the on‑device AI age estimator into granting adult access, exposing the limits of lightweight facial analysis models.

AI age verificationDiscordMeta

0 likes · 6 min read

How a Child’s Finger‑Drawn Moustache Fooled AI Age Verification (and Made Engineers Speechless)

Machine Heart

May 17, 2026 · Artificial Intelligence

ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)

The paper systematically studies Test‑Time Training (TTT) for vision, derives six design principles, and introduces ViT³—a pure TTT architecture that uses full‑batch internal training, a learning rate of 1.0, and lightweight SwiGLU‑Depthwise convolution modules, achieving state‑of‑the‑art linear‑complexity performance across classification, detection, segmentation and generation tasks.

Linear ComplexityTest-Time TrainingVision Transformers

0 likes · 14 min read

ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)

Data Party THU

May 15, 2026 · Artificial Intelligence

94% Precision: YOLO11‑Based Detection of Near‑Earth Object and Satellite Streaks

The StreakMind system built by the Spanish Royal Navy Academy uses a YOLO11‑OBB detector trained on over 2,000 real astronomical images and 280 synthetic streaks to automatically identify satellite and asteroid streaks with 94% precision and 97% recall, delivering standardized database entries and robust frame‑to‑frame tracking.

StreakMindYOLO11astronomical imaging

0 likes · 10 min read

94% Precision: YOLO11‑Based Detection of Near‑Earth Object and Satellite Streaks

Machine Heart

May 15, 2026 · Artificial Intelligence

How X2SAM Empowers Multimodal Models to Segment Images and Videos at Pixel Level

X2SAM is a unified multimodal large model that combines image and video segmentation with language and visual prompts, introduces a Mask Memory for temporal consistency, defines a new V‑VGD task, and achieves state‑of‑the‑art results while cutting training cost by over 30%.

Large Language ModelV-VGDX2SAM

0 likes · 9 min read

How X2SAM Empowers Multimodal Models to Segment Images and Videos at Pixel Level

HyperAI Super Neural

May 14, 2026 · Artificial Intelligence

YOLO‑11 Enables 94% Detection of Near‑Earth Object and Satellite Streaks

StreakMind, developed by the Spanish Royal Navy Academy’s observatory, combines real and synthetic astronomical images to train a YOLO‑11 oriented‑bounding‑box detector that robustly identifies satellite and asteroid streaks, achieving 94% precision and 97% recall on an independent test set of 273 images, and automatically integrates results into a standardized MPC database.

AIStreakMindYOLO-11

0 likes · 8 min read

YOLO‑11 Enables 94% Detection of Near‑Earth Object and Satellite Streaks

Machine Heart

May 14, 2026 · Artificial Intelligence

Breaking the 3D Perception Bottleneck: VGGT Series Enables Dynamic High‑Fidelity Reconstruction

The VGGT series from KOKONI 3D and collaborators tackles three core 3D perception limits—unbounded sequence memory, dynamic‑static entanglement, and compute‑precision trade‑offs—by introducing StreamCacheVGGT, progressive decoupling, and HD‑VGGT, achieving O(1) memory streaming, 15%+ accuracy gains on dynamic benchmarks, and record‑high AUC on RealEstate10K.

3D reconstructionVGGTcomputer vision

0 likes · 10 min read

Breaking the 3D Perception Bottleneck: VGGT Series Enables Dynamic High‑Fidelity Reconstruction

Machine Heart

May 6, 2026 · Artificial Intelligence

Scal3R Enables Stable Kilometer-Scale 3D Reconstruction of Long Videos

Scal3R introduces test‑time training with a global‑context memory and synchronization mechanism that lets models train on and infer over ultra‑long video sequences, achieving accurate camera poses and dense point clouds for kilometer‑scale scenes while outperforming prior SLAM, SfM and streaming baselines on multiple benchmarks.

3D reconstructionScal3RTest-Time Training

0 likes · 11 min read

Scal3R Enables Stable Kilometer-Scale 3D Reconstruction of Long Videos

Machine Heart

May 3, 2026 · Artificial Intelligence

How LEADER Beats Traditional LiDAR Relocalization in Accuracy and Speed

The LEADER framework achieves ten‑millisecond "eye‑open" LiDAR relocalization while surpassing the decimeter‑level accuracy of classic retrieval‑registration pipelines, using cylindrical projection, sparse convolution, and a Truncated Relative Reliability loss, as demonstrated on the NCLT benchmark.

LEADERLiDARRelocalization

0 likes · 9 min read

How LEADER Beats Traditional LiDAR Relocalization in Accuracy and Speed

AI Explorer

May 2, 2026 · Artificial Intelligence

How DeepSeek’s “Cyber Finger” Gives AI a Physical Sense of the World

DeepSeek introduces a “cyber finger” that lets AI not only recognize objects but also infer their spatial relationships, orientations, and manipulability, turning visual perception into a digital simulation of touch and enabling more realistic interaction in robotics, AR, and assistive technologies.

AIDeepSeekaugmented reality

0 likes · 6 min read

How DeepSeek’s “Cyber Finger” Gives AI a Physical Sense of the World

Geek Labs

Apr 30, 2026 · Artificial Intelligence

Why the 14-Year-Old ccv Library Remains a Top Choice for Modern Computer Vision

The ccv library, created in 2010 and still actively maintained, offers a highly portable C‑based computer‑vision toolkit with minimal dependencies, a built‑in cache for preprocessing, a full libnnc neural‑network runtime, and easy builds via Bazel, Make, or Swift Package Manager.

C libraryCross-PlatformEmbedded

0 likes · 5 min read

Why the 14-Year-Old ccv Library Remains a Top Choice for Modern Computer Vision

Machine Heart

Apr 27, 2026 · Artificial Intelligence

Google DeepMind Open‑Sources TIPSv2: State‑of‑the‑Art Patch‑Text Alignment at CVPR 2026

The DeepMind team unveils TIPSv2, a vision‑language pre‑training model that dramatically improves patch‑level image‑text alignment through iBOT++, Head‑only EMA, and multi‑granularity captions, achieving record‑breaking results on nine tasks across twenty datasets while remaining fully open‑source.

DeepMindMultimodal PretrainingPatch-Text Alignment

0 likes · 12 min read

Google DeepMind Open‑Sources TIPSv2: State‑of‑the‑Art Patch‑Text Alignment at CVPR 2026

Java Tech Enthusiast

Apr 25, 2026 · Fundamentals

Turn a MacBook into a Touchscreen in 16 Hours for Under $7 Using Only a Mirror

A team led by Anish Athalye built a $1‑$7 hardware add‑on that lets a MacBook’s built‑in webcam detect finger touches via a tiny mirror, using classic computer‑vision techniques to map those touches to screen coordinates and generate mouse events.

Low‑cost hardwareMacBookTouchscreen hack

0 likes · 8 min read

Turn a MacBook into a Touchscreen in 16 Hours for Under $7 Using Only a Mirror

AI Explorer

Apr 24, 2026 · Artificial Intelligence

Google’s ‘Banana’ Model Redefines Visual Transformers with Dynamic Sparse Attention

Google’s newly unveiled “Banana” visual Transformer introduces dynamic sparse attention that cuts inference cost 3‑5×, reduces memory by 70%, and improves ImageNet accuracy, while demonstrating real‑world gains in autonomous driving, medical imaging, and satellite analysis.

Dynamic Sparse AttentionGoogleImageNet

0 likes · 6 min read

Google’s ‘Banana’ Model Redefines Visual Transformers with Dynamic Sparse Attention

Xiaomi Tech

Apr 22, 2026 · Artificial Intelligence

SVOR Wins CVPR 2026 Video Object Removal Challenge – Xiaomi’s Open‑Source Solution for Three Tough Problems

The article introduces SVOR, a Xiaomi‑developed video object removal framework that tackles shadow residues, motion jitter, and mask defects with MUSE, DA‑Seg, and a two‑stage training pipeline, achieves new SOTA on multiple benchmarks, and clinches first place in the CVPR 2026 video removal contest, with all code and models released publicly.

DA‑SegMUSESVOR

0 likes · 8 min read

SVOR Wins CVPR 2026 Video Object Removal Challenge – Xiaomi’s Open‑Source Solution for Three Tough Problems

Machine Heart

Apr 19, 2026 · Artificial Intelligence

How Google Turns Your CAPTCHA Clicks into Training Data for the Next Generation of AI

The article explains how YouTube’s AI‑video rating and Google’s reCAPTCHA system covertly collect billions of user interactions each day, converting them into labeled data that fuels Google’s computer‑vision models such as Veo, Maps and Waymo, effectively turning routine security checks into a massive, unpaid AI training workforce.

AI trainingGoogleWaymo

0 likes · 7 min read

How Google Turns Your CAPTCHA Clicks into Training Data for the Next Generation of AI

AI Explorer

Apr 16, 2026 · Artificial Intelligence

AI Tech Daily: Top AI Research and Industry Updates on April 16 2026

This roundup highlights recent AI breakthroughs such as NVIDIA‑MIT’s Sol‑RL framework for faster diffusion model training, Peking University’s CPL++ visual localization improvement, DeepMind’s TIPSv2 for image recognition, Boston Dynamics Spot’s AI upgrade, Anthropic’s safety paper, a major MCP protocol vulnerability, OpenAI’s GPT‑5.4 release, and the shifting AI video landscape.

AIAI safetyDiffusion Models

0 likes · 5 min read

AI Tech Daily: Top AI Research and Industry Updates on April 16 2026

Machine Heart

Apr 16, 2026 · Artificial Intelligence

CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

The CPL++ framework equips weakly supervised visual grounding models with confidence‑aware pseudo‑label learning, self‑supervised association correction, and dynamic validation, enabling the model to detect and amend erroneous region‑query links during training, which yields absolute performance gains of 1–6 % across five benchmark datasets.

Visual GroundingWeak Supervisioncomputer vision

0 likes · 9 min read

CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

AIWalker

Apr 10, 2026 · Artificial Intelligence

How RealRestorer Bridges the Gap in Real‑World Image Restoration

RealRestorer leverages large‑scale image‑editing models, a hybrid synthetic‑and‑real degradation pipeline, and a two‑stage training strategy to deliver state‑of‑the‑art open‑source restoration that generalizes across nine real‑world degradation types while preserving content consistency.

benchmarkcomputer visiondeep learning

0 likes · 13 min read

How RealRestorer Bridges the Gap in Real‑World Image Restoration

HyperAI Super Neural

Apr 9, 2026 · Artificial Intelligence

Cornell’s EMSeek Generates Insights from EM Images in 2–5 Minutes, 50× Faster Than Experts

EMSeek, a modular multi‑agent platform from Cornell, integrates perception, structural reconstruction, property prediction, and literature reasoning to automate electron microscopy analysis across 20 material systems and five tasks, achieving up to twice the speed of Segment Anything, over 90% structural similarity, and a 50‑fold reduction in processing time compared with expert workflows, while requiring only about 2 % labeled data for calibration.

EMSeekMaterials Discoverycomputer vision

0 likes · 16 min read

Cornell’s EMSeek Generates Insights from EM Images in 2–5 Minutes, 50× Faster Than Experts

JD Cloud Developers

Apr 8, 2026 · Artificial Intelligence

How JoyAI-Image-Edit Brings Spatial Intelligence to Open‑Source Image Editing

JoyAI-Image-Edit, an open‑source multimodal foundation model from JD Research Institute, integrates text‑to‑image generation, image understanding, and instruction‑driven spatial editing, achieving world‑leading spatial perception and editing capabilities that unlock new applications across e‑commerce, robotics, 3D reconstruction, and design.

Multimodal AIcomputer visiongenerative models

0 likes · 7 min read

How JoyAI-Image-Edit Brings Spatial Intelligence to Open‑Source Image Editing

AIWalker

Apr 6, 2026 · Artificial Intelligence

BIPNet: Adaptive Progressive Upsampling Drives a Leap in Burst Image Restoration (TPAMI 2025)

The TPAMI 2025 paper introduces BIPNet, a unified burst‑image framework that tackles alignment, fusion, and upsampling challenges with edge‑enhanced alignment, pseudo‑burst feature fusion, and adaptive group upsampling, achieving state‑of‑the‑art results across super‑resolution, low‑light enhancement, and denoising while offering lightweight mobile variants.

BIPNetBurst Image ProcessingDenoising

0 likes · 13 min read

BIPNet: Adaptive Progressive Upsampling Drives a Leap in Burst Image Restoration (TPAMI 2025)

AIWalker

Apr 6, 2026 · Artificial Intelligence

How TIR‑Agent Turns Image‑Restoration Tools into a Learnable Decision‑Making Agent

The paper introduces TIR‑Agent, an image‑restoration agent that learns a tool‑calling policy via supervised fine‑tuning and reinforcement learning, addressing exploration stagnation and multi‑objective reward imbalance, and demonstrates over 2.5× faster inference and superior multi‑metric performance on synthetic and real degradation datasets.

Tool Schedulingagent-based AIcomputer vision

0 likes · 18 min read

How TIR‑Agent Turns Image‑Restoration Tools into a Learnable Decision‑Making Agent

Data Party THU

Apr 1, 2026 · Artificial Intelligence

How SwiftTailor Accelerates Realistic 3D Garment Generation

SwiftTailor introduces a two‑stage, geometry‑centric framework that unifies pattern inference and mesh synthesis, dramatically cutting inference time to seconds while achieving state‑of‑the‑art accuracy and visual realism on the Multimodal GarmentCodeData benchmark for digital fashion.

3D garment generationAISwiftTailor

0 likes · 4 min read

How SwiftTailor Accelerates Realistic 3D Garment Generation

Amazon Cloud Developers

Apr 1, 2026 · Artificial Intelligence

Achieving Pro‑Level Vision Detection with Minimal Cost: Fine‑Tuning Amazon Nova Lite

By fine‑tuning Amazon Nova Lite 1.0 on Amazon Bedrock, the study demonstrates how a small training dataset can dramatically improve instruction following and reduce detection boxes—up to 92% fewer—while achieving Pro‑grade accuracy in aerial group detection and low‑light monitoring, all at a fraction of the cost.

Amazon BedrockAmazon Nova Litecomputer vision

0 likes · 20 min read

Achieving Pro‑Level Vision Detection with Minimal Cost: Fine‑Tuning Amazon Nova Lite

Machine Heart

Mar 30, 2026 · Artificial Intelligence

InfoTok: Information-Theoretic Adaptive Video Tokenizer Redefines Efficient Tokenization (ICLR 2026 Oral)

InfoTok, a collaborative effort by Stanford, NVIDIA Cosmos, and NUS, leverages information theory and an ELBO‑based router to allocate tokens adaptively, achieving 2.3× higher compression, 11× faster inference, and superior reconstruction quality on benchmarks such as TokenBench and DAVIS.

ELBOICLR 2026InfoTok

0 likes · 11 min read

InfoTok: Information-Theoretic Adaptive Video Tokenizer Redefines Efficient Tokenization (ICLR 2026 Oral)

Data Party THU

Mar 29, 2026 · Artificial Intelligence

How LoGeR Enables Minute‑Long 3D Reconstruction with Hybrid Memory

The article presents LoGeR, a long‑context geometric reconstruction framework that combines test‑time‑training memory and sliding‑window attention to achieve minute‑scale, fully‑feedforward 3D reconstruction with superior accuracy on benchmarks such as KITTI and VBR.

3D reconstructionHybrid MemoryLoGeR

0 likes · 11 min read

How LoGeR Enables Minute‑Long 3D Reconstruction with Hybrid Memory

AIWalker

Mar 23, 2026 · Artificial Intelligence

Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26

By introducing a dynamic mixture‑of‑experts routing scheme and an end‑to‑end architecture that eliminates NMS and DFL, YOLO‑Master and YOLO26 dramatically cut compute waste and latency on edge devices, achieving up to 43% faster CPU inference while keeping model accuracy, with all code openly released.

Dynamic RoutingMixture of ExpertsModel Optimization

0 likes · 7 min read

Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26

AI Frontier Lectures

Mar 19, 2026 · Artificial Intelligence

Can Circulant Attention Reduce Vision Transformer Cost by 7×?

The article reviews the AAAI 2026 paper "Vision Transformers are Circulant Attention Learners", explaining how modeling self‑attention as a Block‑Circulant matrix enables FFT‑based multiplication that cuts the quadratic complexity of standard attention, achieving up to seven‑fold inference speed‑up while preserving accuracy across ImageNet, COCO and ADE20K benchmarks.

BCCB MatrixCirculant AttentionEfficient Attention

0 likes · 15 min read

Can Circulant Attention Reduce Vision Transformer Cost by 7×?

AI Frontier Lectures

Mar 19, 2026 · Artificial Intelligence

Why Sharing Parameters in Vision Transformers Hurts Performance—and How Layer Specialization Fixes It

The article analyzes the hidden conflict between [CLS] and patch tokens in Vision Transformers, reveals how shared normalization and linear layers cause computational friction, and demonstrates that layer‑specific parameters dramatically improve dense prediction tasks without increasing inference FLOPs.

Dense PredictionLayer SpecializationSelf-Attention

0 likes · 9 min read

Why Sharing Parameters in Vision Transformers Hurts Performance—and How Layer Specialization Fixes It

AIWalker

Mar 18, 2026 · Artificial Intelligence

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

The AAAI 2026 paper by Tsinghua’s Huang‑Gao team shows that modeling Vision‑Transformer attention as a Block‑Circulant matrix and computing it with FFT reduces the quadratic complexity to O(N log N), delivering up to seven‑fold real‑world speedups without sacrificing accuracy.

AAAI 2026Circulant MatricesEfficiency

0 likes · 15 min read

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

SuanNi

Mar 16, 2026 · Artificial Intelligence

How NaLaFormer Revives Linear Attention with Query‑Norm Awareness

NaLaFormer introduces a norm‑aware linear attention mechanism that restores the query‑norm‑driven sharpness of softmax attention, achieving up to 7.5% higher ImageNet accuracy and 92% memory reduction in super‑resolution, while delivering strong results across classification, detection, segmentation, and language modeling tasks.

AILinear AttentionNaLaFormer

0 likes · 13 min read

How NaLaFormer Revives Linear Attention with Query‑Norm Awareness

Machine Learning Algorithms & Natural Language Processing

Mar 15, 2026 · Artificial Intelligence

A 17‑Year‑Old High‑Schooler Becomes First‑Author on a CVPR Paper

A 17‑year‑old high‑school student from Anhui Ansheng School led the first‑author CVPR 2026 paper "CraftMesh," a novel 3D mesh editing framework that combines image editing, mesh generation, and Poisson seamless fusion, achieving superior quantitative metrics and showcasing the growing impact of young researchers in top AI conferences.

3D mesh generationCVPRCraftMesh

0 likes · 7 min read

A 17‑Year‑Old High‑Schooler Becomes First‑Author on a CVPR Paper

AI Explorer

Mar 8, 2026 · Artificial Intelligence

Can a Pure‑Vision Model Redefine AI Perception? Inside ByteDance’s VideoWorld 2

ByteDance and Beijing Jiaotong University unveil VideoWorld 2, a visual‑only AI model that learns from massive video data without language mediation, promising richer detail retention, reduced bias, and a potential paradigm shift in how artificial intelligence perceives the world.

AI perceptionByteDanceMultimodal AI

0 likes · 7 min read

Can a Pure‑Vision Model Redefine AI Perception? Inside ByteDance’s VideoWorld 2

AIWalker

Mar 7, 2026 · Artificial Intelligence

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

Tencent’s YOLO-Master v2026.02 adds a Mixture‑of‑Experts architecture, zero‑overhead LoRA fine‑tuning, Sparse SAHI inference for large images, and Cluster‑Weighted NMS, delivering 3‑5× faster inference, up to 70% reduced training resources, and markedly higher detection accuracy across diverse benchmarks.

LoRAMixture of ExpertsModel Optimization

0 likes · 15 min read

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

Code Mala Tang

Mar 5, 2026 · Artificial Intelligence

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

This tutorial walks readers through the fundamentals of YOLOv12, covering model variants, dataset preparation with Roboflow, optional FlashAttention acceleration, installation, model selection, training commands, post‑training tasks such as tracking, validation, inference, exporting to ONNX, and benchmarking, all with concrete code snippets and practical tips.

FlashAttentionModel TrainingPython

0 likes · 8 min read

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

Code Mala Tang

Mar 1, 2026 · Artificial Intelligence

Why YOLO Dominates Real-Time Object Detection: A Complete Guide

This article provides a comprehensive overview of the YOLO (You Only Look Once) algorithm, explaining its core principles, architecture, version history, training workflow, real‑world applications, strengths, and current limitations for modern computer‑vision tasks.

Real-timeYOLOcomputer vision

0 likes · 9 min read

Why YOLO Dominates Real-Time Object Detection: A Complete Guide

AIWalker

Feb 26, 2026 · Artificial Intelligence

Overcoming Vision Transformer Bottlenecks: The Plug‑and‑Play Upgrade of ViT‑5

ViT‑5 systematically revisits five years of Transformer architecture advances, introducing seven plug‑and‑play components—LayerScale, RMSNorm, GeLU, dual positional encodings, high‑frequency RoPE for register tokens, QK‑Norm, and bias‑free projections—that together raise ImageNet‑1k Top‑1 accuracy to 84.2% (Base) and achieve superior performance across classification, generation, and segmentation tasks.

ViT-5Vision Transformercomputer vision

0 likes · 14 min read

Overcoming Vision Transformer Bottlenecks: The Plug‑and‑Play Upgrade of ViT‑5

HyperAI Super Neural

Feb 22, 2026 · Artificial Intelligence

OCR Models Guide: DeepSeek, PaddlePaddle, Others for High Accuracy & Local Deployment

This article surveys the latest open‑source OCR models—including GLM‑OCR, PaddleOCR‑VL‑1.5, LightOnOCR‑2‑1B, DeepSeek‑OCR 2, and MonkeyOCR—detailing their architectures, benchmark scores on OmniDocBench, hardware requirements, and how to run them via online demos.

Model BenchmarkOCRcomputer vision

0 likes · 8 min read

OCR Models Guide: DeepSeek, PaddlePaddle, Others for High Accuracy & Local Deployment

Data Party THU

Feb 19, 2026 · Artificial Intelligence

How Data Priors and Scene Parameterization Boost 3D Indoor Reconstruction

This thesis investigates the two core challenges of data prior utilization and scene parameterization in multi‑view RGB‑based 3D indoor reconstruction, proposing novel representations and learning‑based methods to improve reconstruction quality, generalization, and applicability across AR, robotics, and autonomous navigation.

3D reconstructioncomputer visiondata priors

0 likes · 8 min read

How Data Priors and Scene Parameterization Boost 3D Indoor Reconstruction

AI Algorithm Path

Feb 18, 2026 · Artificial Intelligence

Using Autoencoders for Industrial Defect Detection

This article explains how to train a simple fully‑connected autoencoder on defect‑free images, use reconstruction error to highlight anomalies in industrial parts, and convert the error into a single metric that cleanly separates good from defective components.

Anomaly DetectionAutoencoderKeras

0 likes · 7 min read

Using Autoencoders for Industrial Defect Detection

AI Cyberspace

Feb 13, 2026 · Artificial Intelligence

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

This article traces the evolution of attention mechanisms from their inaugural application in computer vision and machine translation to their central role in modern Transformer models, detailing the underlying RNN‑Attention designs, the breakthrough in sequence alignment, and the innovations that enabled high‑performance, parallelizable deep learning architectures.

Attention MechanismMachine TranslationTransformer

0 likes · 14 min read

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

xkx's Tech General Store

Jan 27, 2026 · Artificial Intelligence

AI Era Survival: Using YOLOv3 for Accurate Pig Detection

The article explains how YOLOv3’s architectural upgrades—Darknet‑53 backbone, three‑scale feature fusion, refined anchors and multi‑label classification, plus dynamic input sizing—enable a pig‑recognition model trained on 2,456 images to achieve up to 20% higher detection rates and AP scores of 0.673–0.981.

Model TrainingPig DetectionYOLOv3

0 likes · 8 min read

AI Era Survival: Using YOLOv3 for Accurate Pig Detection

xkx's Tech General Store

Dec 30, 2025 · Artificial Intelligence

From Theory to Practice: Reproducing YOLOv1 – A Step‑by‑Step Guide for Traditional Programmers

This article provides a comprehensive, hands‑on walkthrough of YOLOv1—from its single‑stage detection principles and core architectural questions to a full PyTorch implementation, training pipeline, common pitfalls, and a live camera demo—targeted at developers transitioning into AI.

PyTorchResNetSPP

0 likes · 10 min read

From Theory to Practice: Reproducing YOLOv1 – A Step‑by‑Step Guide for Traditional Programmers

php Courses

Dec 9, 2025 · Artificial Intelligence

How to Supercharge Your PHP Apps with AI: A Practical Guide

This guide explains why PHP applications need AI, outlines core AI use cases such as intelligent content processing, computer vision, personalization, and chatbots, and provides step‑by‑step implementation paths, tools, best‑practice recommendations, real‑world case studies, and future trends for developers.

AI integrationNLPPHP

0 likes · 10 min read

How to Supercharge Your PHP Apps with AI: A Practical Guide

Kuaishou Tech

Dec 4, 2025 · Artificial Intelligence

Can a Tree‑Reasoned Model Master Video Emotion Understanding?

The paper introduces VidEmo, a multimodal video foundation model that uses a two‑stage emotion‑clue‑guided reasoning framework and a large emotion‑centric dataset (Emo‑CFG) to achieve state‑of‑the‑art performance on facial attribute, expression, and fine‑grained emotion tasks, surpassing Gemini 2.0.

AIMultimodalcomputer vision

0 likes · 15 min read

Can a Tree‑Reasoned Model Master Video Emotion Understanding?

Tencent Advertising Technology

Dec 4, 2025 · Artificial Intelligence

How POPEN Boosts LVLM Reasoning Segmentation with Preference Optimization and Ensemble

The paper introduces POPEN, a new framework that uses preference‑based optimization and ensemble methods to reduce hallucinations and improve segmentation accuracy in large visual language models, achieving state‑of‑the‑art results on multiple benchmarks.

LVLMPreference OptimizationSegmentation

0 likes · 14 min read

How POPEN Boosts LVLM Reasoning Segmentation with Preference Optimization and Ensemble

Sohu Smart Platform Tech Team

Nov 20, 2025 · Artificial Intelligence

How Hooop Turns HarmonyOS into an Offline AI Basketball Coach

Hooop leverages HarmonyOS's on‑device AI and custom vision algorithms to provide real‑time, offline basketball training by detecting shots, analyzing trajectories, automatically clipping scoring clips, and tracking performance metrics without an internet connection.

AIHarmonyOSVideo Processing

0 likes · 12 min read

How Hooop Turns HarmonyOS into an Offline AI Basketball Coach

Tencent Technical Engineering

Nov 5, 2025 · Artificial Intelligence

iDetex: The Winning AI Model Transforming Image Quality Assessment

iDetex, the champion solution of the ICCV 2025 MIPI Detailed Image Quality Assessment Challenge, introduces a novel multimodal LLM-driven framework that precisely locates, describes, and grades image distortions, outperforming traditional IQA models and enabling practical deployments across video, live streaming, e‑commerce, and image‑processing pipelines.

AIICCV 2025computer vision

0 likes · 18 min read

iDetex: The Winning AI Model Transforming Image Quality Assessment

JD Tech Talk

Nov 4, 2025 · Artificial Intelligence

How AI-Powered Virtual Try-On Transforms Fashion E‑Commerce

The article explains how JD.com's AI virtual try‑on system Oxygen Tryon uses advanced computer‑vision and generative models to let shoppers instantly preview clothing on their own photos, dramatically improving purchase decisions, reducing return rates, and outlining technical challenges, innovations, and future development plans.

AIFashion E‑commercecomputer vision

0 likes · 7 min read

How AI-Powered Virtual Try-On Transforms Fashion E‑Commerce

JD Cloud Developers

Nov 4, 2025 · Artificial Intelligence

How AI-Powered Virtual Try‑On Is Revolutionizing Fashion E‑Commerce

The article explains how JD.com's AI try‑on system Oxygen Tryon uses advanced computer‑vision models to let shoppers instantly preview garments on their own photos, dramatically improving fit perception, reducing return rates, and outlining future technical and business expansions.

AIFashion E‑commercecomputer vision

0 likes · 6 min read

How AI-Powered Virtual Try‑On Is Revolutionizing Fashion E‑Commerce

AsiaInfo Technology: New Tech Exploration

Nov 4, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Video Analysis

This article examines the evolution from single‑frame video analysis to multimodal large models, detailing their architecture, optimization techniques, experimental validation on edge devices, and practical scenarios, while highlighting current limitations and future directions for AI‑driven video understanding.

AIMultimodalcomputer vision

0 likes · 20 min read

How Multimodal Large Models Are Revolutionizing Video Analysis

AI Algorithm Path

Nov 1, 2025 · Artificial Intelligence

Deep Dive into Vision Transformer Patch Embedding Mechanisms

This article explains how Vision Transformers convert images into patch embeddings, compares flattening versus convolutional approaches, discusses position and CLS tokens, analyzes the effect of patch size, explores pixel‑level tokens, and contrasts ViT’s inductive bias with CNNs.

ConvolutionInductive BiasPatch Embedding

0 likes · 10 min read

Deep Dive into Vision Transformer Patch Embedding Mechanisms

JD Retail Technology

Oct 31, 2025 · Artificial Intelligence

How JD’s AI Try‑On “Oxygen Tryon” Revolutionizes Online Fashion Shopping

JD’s Oxygen Tryon leverages advanced AI, keypoint detection, and real‑time rendering to let shoppers virtually try on clothing, dramatically cutting return rates, boosting conversion, and outlining technical challenges, innovations, and future plans for broader fashion applications.

AI try-onFashion E‑commercecomputer vision

0 likes · 6 min read

How JD’s AI Try‑On “Oxygen Tryon” Revolutionizes Online Fashion Shopping

Liangxu Linux

Oct 29, 2025 · Artificial Intelligence

7 Must‑Try Open‑Source Tools for Remote Jobs, AI, and Dev Productivity

This article curates seven open‑source projects—including a remote‑work company list, a versatile file‑conversion platform, a personal finance manager, an AI‑powered resume optimizer, Claude Code resources, a computer‑vision toolbox, and a lightweight AI assistant—each with key features and GitHub links for easy adoption.

AI toolsRemote Workcomputer vision

0 likes · 7 min read

7 Must‑Try Open‑Source Tools for Remote Jobs, AI, and Dev Productivity

Network Intelligence Research Center (NIRC)

Oct 24, 2025 · Artificial Intelligence

Next‑Gen VR Interaction via Micro‑Gesture Recognition: The “MiaoKong Virtual Realm” Demo

At Beijing University of Posts and Telecommunications' 70th anniversary, the Network Intelligence Research Center showcased a micro‑gesture‑driven VR system that captures millimeter‑scale finger motions with high‑precision, low‑latency hand tracking, delivering efficient, fatigue‑reducing interactions and earning strong audience approval.

VR interactionXRcomputer vision

0 likes · 8 min read

Next‑Gen VR Interaction via Micro‑Gesture Recognition: The “MiaoKong Virtual Realm” Demo

Alimama Tech

Oct 22, 2025 · Artificial Intelligence

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

This article details Alibaba’s Taobao Star fashion AIGC model, explaining its data pipeline, captioning strategy, multi‑stage training, and impressive virtual try‑on results for users and merchants, while showcasing model‑based and model‑free generation and pose‑transfer capabilities.

AIAIGCModel Training

0 likes · 11 min read

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

Amap Tech

Oct 2, 2025 · Artificial Intelligence

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

FantasyWorld introduces a geometry‑enhanced framework that augments a frozen video diffusion model with a trainable geometry branch, enabling simultaneous video representation and implicit 3D field generation, achieving spatially consistent, high‑quality virtual worlds and outperforming recent baselines in multi‑view coherence and geometric fidelity.

3D modelingDiffusion ModelsMultimodal AI

0 likes · 11 min read

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

HyperAI Super Neural

Sep 29, 2025 · Artificial Intelligence

8 Popular Remote Sensing Object Detection Datasets with One-Click Downloads

This article presents a curated list of eight widely used remote sensing object detection datasets covering indoor scenes, landslides, drone imagery, crop diseases, safety vests, human fractures, urban issues, and plant diseases, each with size estimates and direct download links for researchers.

AIcomputer visiondatasets

0 likes · 10 min read

8 Popular Remote Sensing Object Detection Datasets with One-Click Downloads

Data Party THU

Sep 27, 2025 · Artificial Intelligence

How Depth-Guided Texture Diffusion Boosts Image Semantic Segmentation

This article reviews the depth‑guided texture diffusion method, detailing its texture extraction, diffusion, structural consistency optimization, and integration into segmentation networks, and shows how it narrows the depth‑RGB gap to achieve state‑of‑the‑art performance on various semantic segmentation tasks.

Semantic Segmentationcomputer visiondepth-guided diffusion

0 likes · 13 min read

How Depth-Guided Texture Diffusion Boosts Image Semantic Segmentation

AntTech

Sep 25, 2025 · Artificial Intelligence

ICCV Spotlight: Pixel Tracing for Copy Detection and Skip-Vision Model Acceleration

The ICCV 2025 live session will deep‑dive into two cutting‑edge papers—PixTrace with CopyNCE for precise image copy detection and Skip‑Vision for dramatically faster training and inference of vision‑language models—showcasing their methods, results, and real‑world impact.

ICCV 2025computer visioncopy detection

0 likes · 5 min read

ICCV Spotlight: Pixel Tracing for Copy Detection and Skip-Vision Model Acceleration

Data Party THU

Sep 16, 2025 · Artificial Intelligence

How Dynamic Snake Convolution Boosts Tubular Segmentation and Infrared Small Target Detection

This article reviews two recent AI papers that introduce dynamic convolution kernels guided by geometric or statistical priors and adaptive loss mechanisms, demonstrating significant improvements in tubular structure segmentation and infrared small‑target detection across multiple 2D and 3D datasets.

computer visiondynamic convolutioninfrared small target detection

0 likes · 6 min read

How Dynamic Snake Convolution Boosts Tubular Segmentation and Infrared Small Target Detection

AIWalker

Sep 2, 2025 · Artificial Intelligence

BEVANet’s Triple Boost for Real-Time Segmentation: Field, Edge, Speed

BEVANet tackles the efficiency‑accuracy trade‑off in real‑time semantic segmentation by integrating large‑kernel attention, an efficient visual attention (EVA) module, a bilateral architecture, and boundary‑guided adaptive fusion, delivering up to 81 % mIoU on Cityscapes at 33 FPS and surpassing prior state‑of‑the‑art models on both accuracy and speed.

EfficiencyReal-timeSemantic Segmentation

0 likes · 19 min read

BEVANet’s Triple Boost for Real-Time Segmentation: Field, Edge, Speed

AntTech

Aug 21, 2025 · Artificial Intelligence

How the Mixture-of-Queries Transformer Tackles Camouflaged Instance Segmentation

The IJCAI 2025 paper showcase introduces the Mixture‑of‑Queries Transformer, a novel model that combines frequency‑domain feature enhancement with collaborative query decoding to achieve state‑of‑the‑art camouflaged instance segmentation across multiple datasets.

IJCAI 2025Transformercamouflaged segmentation

0 likes · 4 min read

How the Mixture-of-Queries Transformer Tackles Camouflaged Instance Segmentation

AIWalker

Aug 18, 2025 · Artificial Intelligence

UniConvNet: Expanding Effective Receptive Field for a SOTA CNN Vision Backbone (ICCV 2025)

UniConvNet introduces a three‑layer receptive‑field aggregator that combines small kernels to enlarge the effective receptive field while preserving its Gaussian distribution, achieving state‑of‑the‑art results on ImageNet‑1K, COCO2017 and ADE20K with only 30M parameters and 5.1G FLOPs.

CNNEffective Receptive FieldICCV2025

0 likes · 6 min read

UniConvNet: Expanding Effective Receptive Field for a SOTA CNN Vision Backbone (ICCV 2025)

AI Algorithm Path

Aug 16, 2025 · Artificial Intelligence

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

Meta's DINOv3 is a 70‑billion‑parameter self‑supervised visual foundation model trained on 17 billion Instagram images without any labels, introducing dense feature extraction, Gram‑Anchoring to prevent feature collapse, high‑resolution adaptation, and multi‑student distillation that together enable out‑of‑the‑box performance on segmentation, depth estimation, 3D matching, and tracking while surpassing prior models such as DINOv2, CLIP, and SAM.

DINOv3Gram AnchoringLarge‑Scale Training

0 likes · 8 min read

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

AIWalker

Aug 13, 2025 · Artificial Intelligence

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

The paper presents OMFA, a diffusion‑based unified framework for virtual try‑on and try‑off that removes the need for garment templates, segmentation masks, and fixed poses by leveraging a novel partial‑diffusion mechanism and SMPL‑X pose conditioning, achieving state‑of‑the‑art results on VITON‑HD and DeepFashion‑MultiModal datasets.

AI try-onSMPL-Xcomputer vision

0 likes · 15 min read

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

AIWalker

Aug 3, 2025 · Artificial Intelligence

Tree-Guided CNN Boosts Image Super-Resolution in Joint University Study

A collaborative team from five universities proposes a tree-structured convolutional neural network that leverages binary‑tree guidance, cosine cross‑domain extraction, and an adaptive Nesterov momentum optimizer to markedly improve image super‑resolution performance.

adaptive optimizercomputer visiondeep learning

0 likes · 5 min read

Tree-Guided CNN Boosts Image Super-Resolution in Joint University Study

Data Party THU

Jul 31, 2025 · Artificial Intelligence

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

The LaVin-DiT paper introduces a large‑scale vision diffusion transformer that combines a spatiotemporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient generation across diverse visual tasks such as segmentation and video prediction.

3D RoPEGenerative AIVision Transformer

0 likes · 11 min read

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

Sohu Tech Products

Jul 30, 2025 · Artificial Intelligence

How 3D Gaussian Splatting Enables Low‑Cost 3D Reconstruction from Simple Videos

This article explains how 3D Gaussian Splatting transforms ordinary video footage into high‑quality 3D reconstructions with minimal equipment, outlines the low‑cost workflow using ffmpeg and COLMAP, and discusses practical challenges and future possibilities for the technology.

3D reconstructionCOLMAPGaussian splatting

0 likes · 5 min read

How 3D Gaussian Splatting Enables Low‑Cost 3D Reconstruction from Simple Videos

AI Frontier Lectures

Jul 26, 2025 · Artificial Intelligence

Training-Free Universal Virtual Try-On: OmniVTON’s Multi-Person Breakthrough

OmniVTON introduces a training‑free universal virtual try‑on framework that decouples garment texture and human pose, achieving high‑fidelity results across both in‑shop and in‑the‑wild scenarios, and uniquely supporting multi‑person virtual dressing, as demonstrated by extensive quantitative and qualitative experiments.

Multi-Personartificial-intelligencecomputer vision

0 likes · 9 min read

Training-Free Universal Virtual Try-On: OmniVTON’s Multi-Person Breakthrough

AI Frontier Lectures

Jul 17, 2025 · Artificial Intelligence

Top 8 Tencent Youtu Papers Accepted at ICCV 2025: Innovations in AI and Vision

The 20th ICCV conference announced 8 papers from Tencent Youtu Lab covering stylized face recognition, AI‑generated image detection, heterogeneous knowledge distillation, multi‑conditional diffusion, multimodal LLM distillation, palmprint recognition, low‑light vision, and oracle bone script decipherment, each pushing the frontier of computer vision and AI research.

ICCV 2025Low‑light Visionartificial-intelligence

0 likes · 17 min read

Top 8 Tencent Youtu Papers Accepted at ICCV 2025: Innovations in AI and Vision

AIWalker

Jul 15, 2025 · Artificial Intelligence

Dynamic Vision Mamba: Re‑ordering Pruning and Adaptive Block Selection Cut FLOPs by 35.2%

This article presents Dynamic Vision Mamba (DyVM), a method that tackles token and block redundancy in Mamba‑based visual models through a novel re‑ordering pruning strategy and dynamic block selection, achieving a 35.2% FLOPs reduction with only a 1.7% accuracy loss while demonstrating strong generalization across tasks and architectures.

Dynamic Block SelectionFLOPs ReductionModel Efficiency

0 likes · 22 min read

Dynamic Vision Mamba: Re‑ordering Pruning and Adaptive Block Selection Cut FLOPs by 35.2%

Amap Tech

Jul 14, 2025 · Artificial Intelligence

How UPRE Achieves Zero-Shot Domain Adaptation for Object Detection with Unified Prompts

The UPRE paper, presented at ICCV, introduces a multi‑view domain prompt and a unified representation enhancement to enable zero‑shot domain adaptation for object detection, achieving state‑of‑the‑art performance across diverse weather, geographic, and synthetic‑to‑real scenarios.

Prompt Engineeringcomputer visionobject detection

0 likes · 10 min read

How UPRE Achieves Zero-Shot Domain Adaptation for Object Detection with Unified Prompts

Baidu Geek Talk

Jul 9, 2025 · Artificial Intelligence

PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

PaddleOCR 3.1 introduces three major upgrades—a multilingual PP‑OCRv5 model supporting 37 languages with over 30% accuracy gain, a PP‑DocTranslation pipeline for high‑quality multi‑language document translation, and MCP server support for flexible AI application integration—accompanied by detailed CLI usage, demo scenarios, and open‑source resources.

AIMCPOCR

0 likes · 11 min read

PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

AI Frontier Lectures

Jul 8, 2025 · Artificial Intelligence

How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer

The LaVin-DiT paper presents a large vision diffusion transformer that integrates a spatio‑temporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient multi‑task generation for images and videos, and details its training via flow‑matching and experimental results.

3D RoPEJoint Diffusion TransformerST-VAE

0 likes · 12 min read

How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer

Zhuanzhuan Tech

Jul 2, 2025 · Artificial Intelligence

How to Build an Image Similarity Search System with ResNet, Milvus, and YOLO

This article walks through the end‑to‑end process of building an image similarity solution—from vectorizing images with ResNet, storing high‑dimensional vectors in Milvus, using HNSW for fast ANN search, to applying YOLO for object detection and practical training tips.

HNSWMilvusResNet

0 likes · 15 min read

How to Build an Image Similarity Search System with ResNet, Milvus, and YOLO

Huolala Tech

Jul 2, 2025 · Artificial Intelligence

Can Diffusion Models Revolutionize Salient Object Detection?

This article introduces a diffusion‑based framework for salient object detection, discusses its background, challenges, and motivations, details the model architecture and training, presents extensive experiments and ablation studies, and outlines limitations and future research directions.

computer visiondeep learningdiffusion model

0 likes · 11 min read

Can Diffusion Models Revolutionize Salient Object Detection?

Qborfy AI

Jul 1, 2025 · Artificial Intelligence

Why CNNs Outperform Fully Connected Networks: A Deep Dive into Architecture and Applications

This article explains the fundamentals of convolutional neural networks (CNNs), detailing their definition, advantages over fully connected networks, architectural components such as input, hidden, and output layers, key operations like convolution, pooling, and activation, and showcases practical applications and notable insights.

CNNartificial-intelligencecomputer vision

0 likes · 5 min read

Why CNNs Outperform Fully Connected Networks: A Deep Dive into Architecture and Applications

Amap Tech

Jun 30, 2025 · Artificial Intelligence

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

SeqGrowGraph introduces a novel chain-of-graph expansion framework that incrementally builds lane topology graphs using a Transformer-based autoregressive model, achieving state‑of‑the‑art performance on large autonomous‑driving datasets such as nuScenes and Argoverse 2 by accurately modeling complex road structures.

Transformerautonomous drivingcomputer vision

0 likes · 10 min read

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

Rare Earth Juejin Tech Community

Jun 27, 2025 · Artificial Intelligence

Image Encryption, Watermarking, Detection & Green Screen Removal in Python

This tutorial walks through Python-based computer‑vision techniques—including XOR‑based image encryption, mask and ROI methods, digital watermark embedding via bit‑plane and LSB, sensitivity‑driven object detection, and HSV‑based green‑screen removal—providing complete code snippets and practical guidance for rapid AI‑assisted learning.

Pythoncomputer visiongreen screen removal

0 likes · 17 min read

Image Encryption, Watermarking, Detection & Green Screen Removal in Python

AntTech

Jun 25, 2025 · Artificial Intelligence

CVPR 2025: Semi-Body Digital Humans, Video Upscaling, Mobile Super‑Res

In this CVPR 2025 showcase, Ant Group presents three cutting‑edge papers—EchoMimicV2 introducing an open‑source semi‑body digital human generation framework, RivuletMLP offering an efficient MLP‑based architecture for compressed video quality enhancement, and a quantized super‑resolution model that achieves real‑time 3× upscaling on mobile NPUs.

AICVPRcomputer vision

0 likes · 6 min read

CVPR 2025: Semi-Body Digital Humans, Video Upscaling, Mobile Super‑Res

AIWalker

Jun 24, 2025 · Artificial Intelligence

How Multimodal Fusion Accelerates Paper Publication: Key Insights and Resources

The article surveys 117 recent multimodal‑fusion papers, classifies them into improvement‑based and combination‑based approaches, highlights representative works such as TimeXL, OGP‑Net, MMR‑Mamba and FusionSight, and provides a free collection of papers, classic models and code repositories for researchers.

AI researchcomputer visiondeep learning

0 likes · 8 min read

How Multimodal Fusion Accelerates Paper Publication: Key Insights and Resources

AI Algorithm Path

Jun 20, 2025 · Artificial Intelligence

Beginner’s Guide to Visual Language Models – Day 1: What They Are and Why They Matter

This article introduces visual‑language models (VLMs), explaining how they combine large language models with visual encoders, why they overcome the rigidity of traditional computer‑vision systems, their key advantages, modular architecture, training methods, and practical applications such as image captioning and visual question answering.

AI ApplicationsMultimodal AIcomputer vision

0 likes · 8 min read

Beginner’s Guide to Visual Language Models – Day 1: What They Are and Why They Matter

AntTech

Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Generative AIcomputer vision

0 likes · 20 min read

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

AI Frontier Lectures

Jun 14, 2025 · Industry Insights

CVPR 2025 Awards Unveiled: Breakthrough Papers and Rising Stars

The CVPR 2025 awards spotlight groundbreaking research, honoring young scholars and top papers such as VGGT, Neural Inverse Rendering, and several honorable mentions, while summarizing each work's core contributions, methodologies, and potential impact on computer vision and related fields.

2025CVPRPaper Awards

0 likes · 13 min read

CVPR 2025 Awards Unveiled: Breakthrough Papers and Rising Stars

AI Frontier Lectures

Jun 14, 2025 · Industry Insights

CVPR 2025 Awards Unveiled: Best Papers, Young Researchers, and Industry Highlights

The CVPR 2025 conference announced record‑breaking submission numbers, awarded a Best Paper to VGGT, honored young researchers and student papers, listed multiple honorary nominations, and highlighted the strong presence of Chinese institutions across the award candidates.

Award SummaryBest PaperCVPR 2025

0 likes · 12 min read

CVPR 2025 Awards Unveiled: Best Papers, Young Researchers, and Industry Highlights

Kuaishou Tech

Jun 10, 2025 · Artificial Intelligence

Can MaIR’s Locality‑Preserving Mamba Boost Image Restoration?

The article presents MaIR, a locality‑ and continuity‑preserving Mamba‑based model for image restoration, detailing its three‑stage architecture, novel scanning strategy, loss functions, experimental results on super‑resolution and denoising, and ablation studies, with links to the arXiv paper and source code.

DenoisingMambacomputer vision

0 likes · 5 min read

Can MaIR’s Locality‑Preserving Mamba Boost Image Restoration?

AI Frontier Lectures

Jun 3, 2025 · Artificial Intelligence

How MaIR Advances Image Restoration with a Locality‑Preserving Mamba Architecture

The article presents MaIR, a Mamba‑based image restoration model that preserves locality and continuity, detailing its architecture, scanning strategies, loss functions, experimental results on super‑resolution and denoising, and an ablation study, while providing links to the arXiv paper and GitHub source code.

DenoisingMambacomputer vision

0 likes · 5 min read

How MaIR Advances Image Restoration with a Locality‑Preserving Mamba Architecture