Artificial Intelligence 10 min read

BCNet: A Bilayer Instance Segmentation Network for Occlusion‑Aware Object Detection

The paper proposes BCNet, a lightweight bilayer instance segmentation network that explicitly models occluder and occludee relationships by treating each region of interest as two overlapping layers, achieving significant performance gains on COCO, COCOA and KINS datasets under heavy occlusion.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
BCNet: A Bilayer Instance Segmentation Network for Occlusion‑Aware Object Detection

Abstract Object occlusion is common in daily scenes and severely degrades the performance of existing detection and segmentation algorithms. By modeling an image as two overlapping layers, BCNet introduces occluder–occludee relationships into the network, offering a lightweight solution for occlusion‑aware instance segmentation.

Background Instance segmentation combines object detection and semantic segmentation, essential for applications such as video editing, video conferencing, medical imaging, and autonomous driving. Heavy overlap between objects creates ambiguous boundaries that challenge conventional methods.

Problem Most recent instance segmentation methods follow a detect‑then‑segment pipeline (e.g., Mask R‑CNN) and focus on backbone improvements, neglecting the mask regression head. Overlapping objects within the same RoI cause confusion, especially when they belong to the same class or have similar textures.

Method BCNet treats each RoI as two overlapping layers: the top layer (Occluder) models the shape and appearance of the occluding object, while the bottom layer (Occludee) infers the partially hidden target. Both layers are implemented as graph convolutional networks (GCNs) with a cascade structure. A non‑local operator reduces parameters while aggregating spatially disjoint features, enabling robust mask prediction under occlusion.

Network Architecture The system consists of an object detector (Faster R‑CNN or FCOS) that provides RoI proposals, followed by RoI Align to extract RoI features. These features feed into the first GCN layer (four conv + non‑local blocks) to predict the occluder mask, then are combined with the first layer’s output and passed to the second GCN layer to predict the occludee mask.

Experiments BCNet was evaluated on COCO, COCOA, and KINS datasets using both one‑stage and two‑stage detectors. It consistently outperformed CenterMask, BlendMask, HTC, and other state‑of‑the‑art methods, especially on heavily occluded objects, while adding minimal overhead in parameters and inference time. Visual comparisons demonstrate clearer boundary delineation and higher interpretability.

Significance The bilayer design effectively decouples occluder and occludee boundaries, improving segmentation accuracy in real‑world scenarios such as short‑video platforms and autonomous driving, where precise object delineation directly impacts user experience and safety.

References The paper cites works on CenterMask, Mask Scoring R‑CNN, amodal instance segmentation, and related graph‑based approaches.

computer visiondeep learninginstance segmentationgraph convolutional networkbilayer networkocclusion handling
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.