How Multi-Level Similarity‑Aware CNN Boosts Person Re‑Identification

This paper introduces a novel multi‑level similarity‑aware CNN (MSP‑CNN) for person re‑identification, applying distinct similarity constraints to low‑ and high‑level feature maps, integrating classification and similarity losses in a multitask framework, and demonstrating superior performance on CUHK03, CUHK01 and Market‑1501 benchmarks.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Multi-Level Similarity‑Aware CNN Boosts Person Re‑Identification

Abstract

Person re‑identification (person re‑ID) aims to match images of the same pedestrian across disjoint camera views. This work proposes a novel multi‑level similarity‑aware convolutional neural network (MSP‑CNN) that applies different similarity constraints to low‑level (Pool1) and high‑level (FC7) feature maps during training. The network jointly optimizes classification (softmax) and similarity losses in a multitask architecture, enabling discriminative feature learning and offline indexing for large‑scale deployment.

Figure 1: Complexity of person re‑ID (CUHK03 dataset)
Figure 1: Complexity of person re‑ID (CUHK03 dataset)

1. Introduction

Person re‑ID is challenging because low resolution, occlusion, and viewpoint variations make facial recognition unreliable. Leveraging full‑body information and deep CNNs has become essential, yet most existing methods treat all feature‑map levels uniformly.

2. Proposed Method

2.1 Overview

We design a Siamese‑style CNN backbone with small 3×3 filters and Inception modules, processing images resized to 160×64 (randomly cropped to 144×56). A multitask loss combines a softmax classification term with similarity constraints applied to specific layers.

2.2 Multi‑Level Similarity Awareness

Low‑level similarity is enforced on the Pool1 feature map using normalized cross‑correlation to match local patches between positive pairs while suppressing matches for negative pairs. High‑level similarity is enforced on the FC7 features using Euclidean distance, encouraging close embeddings for positives and distant ones for negatives.

2.3 Multi‑Task Architecture

The network jointly learns classification and similarity objectives. During training, we first train the CNN with softmax and Euclidean losses, then add the low‑level cross‑correlation loss for a few epochs. At test time, each gallery image’s feature is extracted offline, and queries are ranked by Euclidean distance, allowing efficient indexing.

Figure 2: Training stage architecture
Figure 2: Training stage architecture
Figure 3: Test stage architecture
Figure 3: Test stage architecture

3. Experiments

3.1 Datasets and Protocols

We evaluate on CUHK03, Market‑1501, and CUHK01 using CMC top‑k accuracy and mean average precision (mAP) for Market‑1501.

3.2 Implementation Details

The backbone consists of three CONV modules, six Inception modules, and one fully‑connected module, implemented in Caffe on an NVIDIA M40 GPU.

Table 1: Basic network structure
Table 1: Basic network structure

3.3 Training Strategy

Negative‑positive pairs are sampled in a 2:1 ratio. Data augmentation follows AlexNet practices.

Figure 5: Sampling process illustration
Figure 5: Sampling process illustration

4. Results and Discussion

4.1 Comparison with State‑of‑the‑Art

Our MSP‑CNN outperforms existing methods on all three datasets, achieving higher CMC ranks and mAP.

Table 2: CUHK03 results
Table 2: CUHK03 results
Table 3: CUHK01 results
Table 3: CUHK01 results
Table 4: Market‑1501 results
Table 4: Market‑1501 results

4.2 Component Analysis

Ablation studies on CUHK03 show the contributions of the base CNN, high‑level similarity, and low‑level similarity constraints.

Table 5: Component effectiveness
Table 5: Component effectiveness

5. Conclusion and Future Work

Future work will explore suitable objectives for middle‑level layers and incorporate additional feature‑map levels to further improve person re‑identification.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CNNDeep Learningmulti-task learningsimilarity constraints
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.