Artificial Intelligence 15 min read

How Multi-Level Similarity‑Aware CNN Boosts Person Re‑Identification

This paper introduces a novel multi‑level similarity‑aware CNN (MSP‑CNN) for person re‑identification, applying distinct similarity constraints to low‑ and high‑level feature maps, integrating classification and similarity losses in a multitask framework, and demonstrating superior performance on CUHK03, CUHK01 and Market‑1501 benchmarks.

Alibaba Cloud Developer

Jul 31, 2018

How Multi-Level Similarity‑Aware CNN Boosts Person Re‑Identification

Abstract

Person re‑identification (person re‑ID) aims to match images of the same pedestrian across disjoint camera views. This work proposes a novel multi‑level similarity‑aware convolutional neural network (MSP‑CNN) that applies different similarity constraints to low‑level (Pool1) and high‑level (FC7) feature maps during training. The network jointly optimizes classification (softmax) and similarity losses in a multitask architecture, enabling discriminative feature learning and offline indexing for large‑scale deployment.

Figure 1: Complexity of person re‑ID (CUHK03 dataset)

1. Introduction

Person re‑ID is challenging because low resolution, occlusion, and viewpoint variations make facial recognition unreliable. Leveraging full‑body information and deep CNNs has become essential, yet most existing methods treat all feature‑map levels uniformly.

2. Proposed Method

2.1 Overview

We design a Siamese‑style CNN backbone with small 3×3 filters and Inception modules, processing images resized to 160×64 (randomly cropped to 144×56). A multitask loss combines a softmax classification term with similarity constraints applied to specific layers.

2.2 Multi‑Level Similarity Awareness

Low‑level similarity is enforced on the Pool1 feature map using normalized cross‑correlation to match local patches between positive pairs while suppressing matches for negative pairs. High‑level similarity is enforced on the FC7 features using Euclidean distance, encouraging close embeddings for positives and distant ones for negatives.

2.3 Multi‑Task Architecture

The network jointly learns classification and similarity objectives. During training, we first train the CNN with softmax and Euclidean losses, then add the low‑level cross‑correlation loss for a few epochs. At test time, each gallery image’s feature is extracted offline, and queries are ranked by Euclidean distance, allowing efficient indexing.

3. Experiments

3.1 Datasets and Protocols

We evaluate on CUHK03, Market‑1501, and CUHK01 using CMC top‑k accuracy and mean average precision (mAP) for Market‑1501.

3.2 Implementation Details

The backbone consists of three CONV modules, six Inception modules, and one fully‑connected module, implemented in Caffe on an NVIDIA M40 GPU.

3.3 Training Strategy

Negative‑positive pairs are sampled in a 2:1 ratio. Data augmentation follows AlexNet practices.

4. Results and Discussion

4.1 Comparison with State‑of‑the‑Art

Our MSP‑CNN outperforms existing methods on all three datasets, achieving higher CMC ranks and mAP.

4.2 Component Analysis

Ablation studies on CUHK03 show the contributions of the base CNN, high‑level similarity, and low‑level similarity constraints.

5. Conclusion and Future Work

Future work will explore suitable objectives for middle‑level layers and incorporate additional feature‑map levels to further improve person re‑identification.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN deep learning Multi-Task Learning similarity constraints

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.