Street Scene Understanding: Segmentation Technology, Research Progress, and Business Applications
Meituan’s Street‑Scene Understanding team built a high‑precision, efficient segmentation system that aligns motion and static semantics, mines hard examples, iterates models via a data‑model loop, and pursues unified open‑world segmentation, winning multiple CVPR 2023 awards and powering map production, autonomous delivery and store‑scene reconstruction.
Visual segmentation plays a crucial role in street‑scene understanding, yet it faces many challenges. The Meituan Street‑Scene Understanding team has built a comprehensive segmentation system that balances accuracy and efficiency, achieving notable results in both research and production. Their work has earned two championship titles and one third‑place finish at CVPR 2023 competitions.
1. Problem Background Street‑scene data are captured by various devices (cameras, LiDAR, etc.). Camera video is low‑cost and widely available, making it the primary source for many computer‑vision tasks such as scene reconstruction, autonomous driving, and robot navigation. Understanding street scenes requires multi‑level perception (point‑level, line‑level, surface‑level, volume‑level) and logical relationships among these elements.
2. Research Status The field has evolved from basic image segmentation to a wide range of tasks: semantic, instance, panoptic, and video segmentation. Classic models (FCN, U‑Net, DeepLab, OCRNet, SegFormer, Mask‑RCNN, etc.) are widely used, while recent works focus on efficiency (BiSeNet, STDCNet, ShuffleNet, MobileNet) and handling long‑tail distributions (CANet, PADing). Video segmentation methods such as OSVOS, MATNet, and interactive approaches (ScribbleSup, FocalClick) have also been explored.
3. Core Technologies
3.1 High‑Precision Segmentation in Complex Scenes A Motion‑State Alignment Framework (MSAF) aligns dynamic and static semantics across frames to improve pixel‑level accuracy, achieving state‑of‑the‑art results on Cityscapes and CamVid.
3.2 Automatic Hard‑Example Mining The Perceive‑Excavate‑Purify (PEP) pipeline uses multi‑branch feature extraction (instance perception, description, and learning) to discover and refine difficult targets, outperforming baselines on COCO.
3.3 Efficient Model Iteration A data‑model closed‑loop collects massive unlabeled data, generates pseudo‑labels, and applies active learning to select high‑value samples, enabling rapid model updates across lightweight, medium, and heavyweight model families.
3.4 Towards Unified Open Segmentation The team investigates multi‑task unified models (semantic, instance, panoptic, edge detection) for both images and videos, and explores cross‑modal extensions (text‑guided segmentation, open‑world segmentation) to build more generalizable systems.
4. CVPR 2023 Achievements Two workshop papers were accepted: “Motion‑State Alignment for Video Semantic Segmentation” and “Perceive, Excavate and Purify: A Novel Object Mining Framework for Instance Segmentation.” The team also won the semantic and panoptic segmentation tracks of the ACDC adverse‑weather challenge and secured third place in the video panoptic segmentation track.
5. Business Applications The segmentation technology is deployed in Meituan’s map production (road element extraction, obstacle filtering), autonomous delivery (high‑definition map generation, traffic sign detection), and store‑scene reconstruction (geometry, semantics, object counting). It also supports intelligent annotation and data generation pipelines.
6. Summary and Outlook While segmentation remains central to street‑scene understanding, challenges such as extreme weather, scale variance, and computational constraints persist. Future work will focus on higher precision, unified models, multimodal integration, and leveraging large language and vision models to achieve open‑world “everything‑segmentation” and higher‑level semantic reasoning.
7. Authors Jin Ming, Wang Wang, Yi Ting, Xing Yue, Jun Feng, and others from Meituan’s Basic R&D Platform / Visual Intelligence Department.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
