Tagged articles
4 articles
Page 1 of 1
Machine Heart
Machine Heart
Apr 16, 2026 · Artificial Intelligence

CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

The CPL++ framework equips weakly supervised visual grounding models with confidence‑aware pseudo‑label learning, self‑supervised association correction, and dynamic validation, enabling the model to detect and amend erroneous region‑query links during training, which yields absolute performance gains of 1–6 % across five benchmark datasets.

Computer VisionVisual GroundingWeak Supervision
0 likes · 9 min read
CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding
Machine Heart
Machine Heart
Mar 31, 2026 · Artificial Intelligence

Point‑VLA: Overcoming Embodied AI’s Language Bottleneck with Visual Grounding

The Point‑VLA method introduced by Qianxun AI’s Gaoyang team tackles the fundamental limits of language‑only instruction in vision‑language‑action models by adding visual grounding via bounding‑box cues, boosting real‑robot success rates from 32.4% to 92.5% across six challenging tasks.

Multimodal LearningPoint-VLARobotics
0 likes · 13 min read
Point‑VLA: Overcoming Embodied AI’s Language Bottleneck with Visual Grounding
DaTaobao Tech
DaTaobao Tech
Nov 25, 2024 · Artificial Intelligence

Open‑Set Object Detection and Visual Grounding: Analysis of YOLO‑World, Grounding DINO, and YOLO11

The article surveys state‑of‑the‑art open‑set object detection and visual‑grounding models—Grounding DINO, YOLO‑World, and the latest YOLO 11—detailing their architectures, training strategies, and experimental results on home‑decoration datasets, showing that open‑set detectors recognize unseen objects while YOLO 11 excels on known categories, and that integrating both approaches yields superior performance, highlighting the expanded potential of detectors for real‑world applications.

Computer VisionDeep LearningGrounding DINO
0 likes · 15 min read
Open‑Set Object Detection and Visual Grounding: Analysis of YOLO‑World, Grounding DINO, and YOLO11
Youku Technology
Youku Technology
Aug 6, 2020 · Artificial Intelligence

Recent ACM MM Papers Accepted by Alibaba Entertainment Group

Alibaba Entertainment Group secured four ACM MM paper acceptances, presenting a probabilistic graphical model for crowdsourced visual quality assessment, an attention‑driven Siamese network with reinforcement learning for robust object tracking, a scene‑aware context‑graph method for unsupervised video anomaly detection, and a cross‑modal graph‑matching approach for visual grounding.

Object TrackingVisual Groundingcrowdsourcing
0 likes · 6 min read
Recent ACM MM Papers Accepted by Alibaba Entertainment Group