Artificial Intelligence 10 min read

How Collaborative Learning Boosts Co‑Saliency Object Detection

This article introduces a collaborative‑learning based co‑saliency object detection algorithm that incorporates class‑conditioned information during training, dramatically improving the model's ability to distinguish and detect common objects across image groups, and demonstrates its effectiveness through extensive experiments and real‑world applications.

Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
How Collaborative Learning Boosts Co‑Saliency Object Detection

Background

Human vision can detect the most attractive object in a single image and extract co‑occurring objects from a set of images. In computer vision, the former is called salient object detection, while the latter is known as collaborative (co‑) saliency object detection. Existing co‑saliency methods struggle to differentiate objects of different categories.

Proposed Approach

We propose a collaborative‑learning based co‑saliency detection algorithm that injects class‑conditioned information during training, enabling the network to detect objects according to given class conditions and significantly enhancing discrimination and overall performance.

Application Scenarios

In multi‑user settings, the algorithm can discover common objects among images uploaded by different users, facilitating interest‑based social networking. For single‑user scenarios, it enables efficient batch processing such as bulk object cut‑out, improving image handling efficiency.

Algorithm Overview

The model consists of an encoder, a decoder, a global relation learning module, a semantic classification learning module, and a global collaborative learning module (used only during training). The encoder extracts feature maps, the global relation learning module derives a shared representation for an image group, and deep‑separable filtering combines this representation with the feature maps before decoding.

Global Relation Learning Module

This module captures common features across the entire image set. It processes feature maps with a convolution, computes a pairwise relation matrix, aggregates via max and mean operations to form a global relation map, multiplies it with the original feature map, and averages to obtain the group‑wise common representation.

Semantic Classification Learning Module

Using image‑level class labels, this module assists the global relation learning module by supervising the extracted group representation through a convolutional network followed by a fully‑connected layer.

Global Collaborative Learning Module

Through contrastive learning, this module ensures that the extracted group representation is unique and correct for its image set while being absent for other sets, thereby enlarging inter‑class distances and improving discrimination.

Experimental Results

Extensive experiments on three datasets show that each of the three modules contributes positively, and their combination yields substantial performance gains. Compared with prior methods, our approach achieves superior results, especially on the challenging CoCA dataset.

Related Links

Paper: https://arxiv.org/pdf/2104.01108.pdf Code: https://github.com/fanq15/GCoNet

Illustrations

computer visiondeep learningimage segmentationcollaborative learningco-saliency detection
Kuaishou Audio & Video Technology
Written by

Kuaishou Audio & Video Technology

Explore the stories behind Kuaishou's audio and video technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.