Tagged articles

Visual Grounding

5 articles · Page 1 of 1

Jun 2, 2026 · Artificial Intelligence

How Nvidia’s Open‑Source LocateAnything‑3B Enables Image & Video Target Pointing and Open‑Vocabulary Grounding

The article introduces Nvidia's open‑source LocateAnything‑3B visual‑language model, explains its Parallel Box Decoding innovation that boosts grounding speed and accuracy, describes the massive 138 M‑sample training dataset, reports benchmark gains, and provides a step‑by‑step HyperAI notebook tutorial for running the model.

LocateAnything-3BMultimodal AINVIDIA

0 likes · 5 min read

How Nvidia’s Open‑Source LocateAnything‑3B Enables Image & Video Target Pointing and Open‑Vocabulary Grounding

Machine Heart

Apr 16, 2026 · Artificial Intelligence

CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

The CPL++ framework equips weakly supervised visual grounding models with confidence‑aware pseudo‑label learning, self‑supervised association correction, and dynamic validation, enabling the model to detect and amend erroneous region‑query links during training, which yields absolute performance gains of 1–6 % across five benchmark datasets.

Visual GroundingWeak Supervisioncomputer vision

0 likes · 9 min read

CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

Machine Heart

Mar 31, 2026 · Artificial Intelligence

Point‑VLA: Overcoming Embodied AI’s Language Bottleneck with Visual Grounding

The Point‑VLA method introduced by Qianxun AI’s Gaoyang team tackles the fundamental limits of language‑only instruction in vision‑language‑action models by adding visual grounding via bounding‑box cues, boosting real‑robot success rates from 32.4% to 92.5% across six challenging tasks.

Multimodal LearningPoint-VLAVisual Grounding

0 likes · 13 min read

Point‑VLA: Overcoming Embodied AI’s Language Bottleneck with Visual Grounding

DaTaobao Tech

Nov 25, 2024 · Artificial Intelligence

Open‑Set Object Detection and Visual Grounding: Analysis of YOLO‑World, Grounding DINO, and YOLO11

The article surveys state‑of‑the‑art open‑set object detection and visual‑grounding models—Grounding DINO, YOLO‑World, and the latest YOLO 11—detailing their architectures, training strategies, and experimental results on home‑decoration datasets, showing that open‑set detectors recognize unseen objects while YOLO 11 excels on known categories, and that integrating both approaches yields superior performance, highlighting the expanded potential of detectors for real‑world applications.

Deep LearningGrounding DINOVisual Grounding

0 likes · 15 min read

Open‑Set Object Detection and Visual Grounding: Analysis of YOLO‑World, Grounding DINO, and YOLO11

Youku Technology

Aug 6, 2020 · Artificial Intelligence

Recent ACM MM Papers Accepted by Alibaba Entertainment Group

Alibaba Entertainment Group secured four ACM MM paper acceptances, presenting a probabilistic graphical model for crowdsourced visual quality assessment, an attention‑driven Siamese network with reinforcement learning for robust object tracking, a scene‑aware context‑graph method for unsupervised video anomaly detection, and a cross‑modal graph‑matching approach for visual grounding.

Graph Neural NetworksObject TrackingVisual Grounding

0 likes · 6 min read

Recent ACM MM Papers Accepted by Alibaba Entertainment Group