How Collaborative Learning Boosts Co‑Saliency Object Detection
This article introduces a collaborative‑learning based co‑saliency object detection algorithm that incorporates class‑conditioned information during training, dramatically improving the model's ability to distinguish and detect common objects across image groups, and demonstrates its effectiveness through extensive experiments and real‑world applications.
Background
Human vision can detect the most attractive object in a single image and extract co‑occurring objects from a set of images. In computer vision, the former is called salient object detection, while the latter is known as collaborative (co‑) saliency object detection. Existing co‑saliency methods struggle to differentiate objects of different categories.
Proposed Approach
We propose a collaborative‑learning based co‑saliency detection algorithm that injects class‑conditioned information during training, enabling the network to detect objects according to given class conditions and significantly enhancing discrimination and overall performance.
Application Scenarios
In multi‑user settings, the algorithm can discover common objects among images uploaded by different users, facilitating interest‑based social networking. For single‑user scenarios, it enables efficient batch processing such as bulk object cut‑out, improving image handling efficiency.
Algorithm Overview
The model consists of an encoder, a decoder, a global relation learning module, a semantic classification learning module, and a global collaborative learning module (used only during training). The encoder extracts feature maps, the global relation learning module derives a shared representation for an image group, and deep‑separable filtering combines this representation with the feature maps before decoding.
Global Relation Learning Module
This module captures common features across the entire image set. It processes feature maps with a convolution, computes a pairwise relation matrix, aggregates via max and mean operations to form a global relation map, multiplies it with the original feature map, and averages to obtain the group‑wise common representation.
Semantic Classification Learning Module
Using image‑level class labels, this module assists the global relation learning module by supervising the extracted group representation through a convolutional network followed by a fully‑connected layer.
Global Collaborative Learning Module
Through contrastive learning, this module ensures that the extracted group representation is unique and correct for its image set while being absent for other sets, thereby enlarging inter‑class distances and improving discrimination.
Experimental Results
Extensive experiments on three datasets show that each of the three modules contributes positively, and their combination yields substantial performance gains. Compared with prior methods, our approach achieves superior results, especially on the challenging CoCA dataset.
Related Links
Paper: https://arxiv.org/pdf/2104.01108.pdf Code: https://github.com/fanq15/GCoNet
Illustrations
Kuaishou Audio & Video Technology
Explore the stories behind Kuaishou's audio and video technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.