CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

The CPL++ framework equips weakly supervised visual grounding models with confidence‑aware pseudo‑label learning, self‑supervised association correction, and dynamic validation, enabling the model to detect and amend erroneous region‑query links during training, which yields absolute performance gains of 1–6 % across five benchmark datasets.

Machine Heart
Machine Heart
Machine Heart
CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

Background and Motivation

Visual grounding aims to locate image regions from natural‑language queries. Fully supervised methods need dense image‑text‑box annotations, which are costly. Weakly supervised visual grounding uses only image‑text pairs but suffers from unreliable cross‑modal matching and error propagation.

Limitations of Existing Weak Supervision

Prior weakly supervised approaches treat grounding as a retrieval problem, relying on cross‑modal similarity scores or reconstruction losses. The gap between high‑level language concepts and pixel‑level visual features leads to many false pseudo‑associations. Earlier unsupervised methods generate rigid pseudo‑queries lacking diversity and still ignore the impact of erroneous associations.

Proposed Framework: CPL and CPL++

Confidence‑aware Pseudo‑label Learning (CPL) introduces three complementary pseudo‑query generation pipelines—Heuristic+, Object‑Centric, and Relation‑Aware—to produce descriptive, realistic, and diverse pseudo‑queries for each candidate region. Similarity between the real query and pseudo‑queries is computed in the text feature space; the region with highest similarity becomes the initial pseudo‑label, avoiding direct cross‑modal alignment.

Static Cross‑Modal Verification

A frozen pre‑trained vision‑language model evaluates each region‑query pair before training and outputs a confidence score. Pairs with scores below a threshold are filtered, reducing the influence of false associations.

CPL++: Self‑Supervised Association Correction

CPL++ builds a semantic‑aware candidate pool using category, attribute, and spatial relation information extracted from the query. A composite scoring function combines query‑region matching and detector confidence (shown in the figure). During training, if the IoU between the model’s predicted box and the best candidate falls below a threshold, the association is treated as erroneous, re‑weighted, and a refined pseudo‑label is generated.

Dynamic Self‑Supervised Verification

CPL++ upgrades the static verifier to a dynamic mechanism. Training loss of each sample is monitored; samples with higher loss receive larger weights via a dynamic selective localization loss, allowing the model to focus on correcting noisy labels while still leveraging the static prior.

Experimental Evaluation

The method is evaluated on five weakly supervised visual grounding benchmarks: RefCOCO, RefCOCO+, RefCOCOg, ReferItGame, and Flickr30K Entities. CPL outperforms existing weakly and unsupervised methods. CPL++ adds absolute improvements of 2.78 %, 5.81 %, 1.08 %, 2.03 %, and 2.55 % on the respective datasets, narrowing the gap to fully supervised approaches.

Qualitative Analysis

Visualizations show that CPL generates diverse, accurate pseudo‑queries, and CPL++’s correction module progressively refines erroneous associations, ultimately aligning predicted boxes with the true target regions.

Paper: https://ieeexplore.ieee.org/document/11433810/

Code: https://github.com/oceanflowlab/CPL

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

computer visionVisual Groundingweak supervisionself-correctionpseudo-labelingconfidence-aware
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.