Highlights of Six Meituan Papers Accepted at CVPR 2022
Meituan’s six CVPR 2022 papers advance computer vision by introducing a few‑sample model compression method, a language‑bridged video object segmentation approach, a single‑stage 3D visual grounding technique, a dynamic early‑exit image captioning system, a boosted black‑box adversarial attack, and a semi‑supervised video paragraph grounding framework.
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022 was held in New Orleans, USA. Meituan’s technical team had multiple papers accepted, covering model compression, video object segmentation, 3D visual grounding, image captioning, model security, and cross‑modal video retrieval.
Paper 01 | Compressing Models with Few Samples: Mimicking then Replacing
Paper Download
Authors: Wang Huanyu (Meituan intern & Nanjing University), Liu Junjie (Meituan), Ma Xin (Meituan), Yong Yang (Meituan intern & Xi’an Jiaotong University), Chai Zhenhua (Meituan), Wu Jianxin (Nanjing University)
Paper Type: CVPR Main Conference – Long Paper
The work proposes the MiR (Mimicking then Replacing) method, which transfers knowledge only from the penultimate layer, discarding the need for posterior distribution alignment used in traditional knowledge distillation. By grafting the original model’s classification/detection head onto the compressed model, rapid fine‑tuning with few samples is achieved. Experiments show significant improvements over baselines and validation on Meituan’s image security audit scenarios.
Paper 02 | Language‑Bridged Spatial‑Temporal Interaction for Referring Video Object Segmentation
Paper Download
Authors: Ding Zihan (Meituan), Hui Tianrui (University of Chinese Academy of Sciences), Huang Junshi (Meituan), Wei Xiaoming (Meituan), Han Jizhong (University of Chinese Academy of Sciences), Liu Si (Beihang University)
Paper Type: CVPR 2022 Main Conference – Poster
The authors introduce the LBDT (Language‑Bridged Dual‑Transfer) module, which uses language as an intermediate bridge to enable explicit and adaptive spatio‑temporal interaction early in the encoder. A bilateral channel activation (BCA) module is added in the decoder to denoise and highlight temporally consistent features. The method achieves state‑of‑the‑art performance on four public datasets without requiring pre‑training on image referring segmentation. Code: LBDT .
Paper 03 | 3D‑SPS: Single‑Stage 3D Visual Grounding via Referred Point Progressive Selection
Paper Download
Authors: Luo Junyu (Meituan intern & Beihang University), Fu Jiahui (Meituan intern & Beihang University), Kong Xianghao (Meituan), Gao Chen (Beihang University), Ren Haibing (Meituan), Shen Hao (Meituan), Xia Huaxia (Meituan), Liu Si (Beihang University)
Paper Type: CVPR 2022 Main Conference – Oral
The paper tackles 3D visual grounding by proposing a single‑stage approach (3D‑SPS) that progressively selects key points guided by language. It introduces a description‑aware key‑point sampling (DKS) module and a target‑oriented progressive relationship mining (TPM) module, eliminating the need for a separate detection‑matching pipeline and directly grounding objects in point clouds.
Paper 04 | DeeCap: Dynamic Early Exiting for Efficient Image Captioning
Paper Download
Authors: Fei Zhengcong (Meituan), Yan Xu (Institute of Computing Technology, CAS), Wang Shuhui (Institute of Computing Technology, CAS), Tian Qi (Huawei)
Paper Type: CVPR 2022 Main Conference – Poster
DeeCap introduces a dynamic early‑exit mechanism for Transformer‑based image captioning. By employing a mimicry learning module that predicts deep‑layer features from shallow ones, the model can exit early with minimal accuracy loss, achieving up to 4× speed‑up on MS‑COCO and Flickr30K while retaining competitive performance. Code: DeeCap .
Paper 05 | Boosting Black‑Box Attack with Partially Transferred Conditional Adversarial Distribution
Paper Download
Authors: Feng Yan (Meituan), Wu Baoyuan (The Chinese University of Hong Kong), Fan Yanbo (Tencent), Liu Li (The Chinese University of Hong Kong), Li Zhifeng (Tencent), Xia Shuta (Tsinghua University)
Paper Type: CVPR 2022 Main Conference – Poster
The study addresses black‑box model security by proposing a robust adversarial transfer mechanism that partially transfers conditional adversarial distribution parameters from a surrogate model while learning the remaining parameters from target model queries, mitigating surrogate bias and improving attack success.
Paper 06 | Semi‑supervised Video Paragraph Grounding with Contrastive Encoder
Paper Download
Authors: Jiang Xun (University of Electronic Science and Technology of China), Xu Xing (UESTC), Zhang Jingran (UESTC), Shen Fumin (UESTC), Cao Zuo (Meituan), Shen Hengtiao (UESTC)
Paper Type: CVPR Main Conference – Poster
The authors present a semi‑supervised VPG (Video Paragraph Grounding) framework that combines a Transformer‑based base model with a contrastive encoder to align video and paragraph texts at both coarse and fine granularity, reducing reliance on densely annotated timestamps while achieving SOTA performance.
These six papers demonstrate Meituan’s contributions to model compression, video segmentation, image captioning, adversarial robustness, and cross‑modal video retrieval, reflecting strong collaboration with academia and industry.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
