Why the AAAI22 Re‑ID Paper Leaks Data and a Simpler Alternative Beats It

The author examines the AAAI 2022 paper “Mind Your Clever Neighbours,” reveals that it exploits a data‑leak in unsupervised person re‑identification, critiques the unnecessary Graph Correlation Learning step, and demonstrates a much simpler averaging method that yields superior results.

Baobao Algorithm Notes
Baobao Algorithm Notes
Baobao Algorithm Notes
Why the AAAI22 Re‑ID Paper Leaks Data and a Simpler Alternative Beats It

Background

The AAAI 2022 paper Mind Your Clever Neighbours: Unsupervised Person Re‑identification via Adaptive Clustering Relationship Modeling sparked a discussion about whether its reported performance relied on a data leak (often called a “leak”).

Leak Mechanism in Person‑ReID Datasets

Many person‑re‑identification (re‑ID) datasets encode identity information in the file names. Consecutive video frames preserve this ordering, so the sequence of file names implicitly reveals which images belong to the same person. By exploiting this ordering, a practitioner can generate pseudo‑labels without any visual learning, which constitutes a leak.

Original Unsupervised Re‑ID Pipeline

Extract raw visual features for each image using a ResNet backbone (the “original representation”).

Apply a Graph Correlation Learning (GCL) module inside each training batch:

GCL builds a fully‑connected graph of the batch samples.

Attention‑based aggregation propagates information across the graph, producing an enhanced feature vector.

The enhanced vector is concatenated with the original ResNet feature, yielding Enhanced Representation 1 .

Cluster the enhanced features (e.g., with DBSCAN or k‑means) to obtain pseudo‑labels for the images.

Train the network further with Self‑Contrastive Learning (SCL) using the pseudo‑labels, producing Enhanced Representation 2 .

Compute pairwise similarity scores on the final representations for re‑ID matching.

Problem with the GCL Step

The GCL operation injects global batch information into each sample by concatenating a batch‑level aggregated vector. In typical re‑ID training batches, most samples share the same identity, so the aggregated vector mainly reflects that dominant identity. This introduces a bias that masks the true contribution of the visual features and adds unnecessary computational complexity.

Proposed Simplified Alternative

Replace the GCL module with a straightforward batch‑mean operation:

# Assume `features` is a tensor of shape (B, D) where B is batch size and D is feature dimension
batch_mean = features.mean(dim=0)               # shape (D,)
# Expand to batch size and concatenate
enhanced = torch.cat([features, batch_mean.unsqueeze(0).repeat(B, 1)], dim=1)  # shape (B, 2D)

Steps:

For each training batch, compute the average ResNet feature across all samples.

Concatenate this average vector to the original ResNet feature of every sample.

Proceed with clustering, pseudo‑label generation, and SCL exactly as in the original pipeline.

This replacement requires no attention mechanism, no additional parameters, and no fine‑tuning of a separate module.

Empirical Observation

Experiments reported by the author show that the simple averaging strategy consistently outperforms the original GCL‑based enhancement on standard re‑ID benchmarks (higher mAP and rank‑1 scores), while reducing training time and implementation complexity.

Conclusion

The primary source of the reported performance gain in the paper is the exploitation of the dataset leak rather than the sophisticated Graph Correlation Learning. Transparent reporting of such leaks is essential for fair evaluation of unsupervised re‑ID methods.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Unsupervised Learningperson re-identificationdata leakagegraph correlation learning
Baobao Algorithm Notes
Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.