Artificial Intelligence 15 min read

Highlights of Six Meituan Papers Accepted at CVPR 2022

Meituan’s six CVPR 2022 papers advance computer vision by introducing a few‑sample model compression method, a language‑bridged video object segmentation approach, a single‑stage 3D visual grounding technique, a dynamic early‑exit image captioning system, a boosted black‑box adversarial attack, and a semi‑supervised video paragraph grounding framework.

Meituan Technology Team

Jun 23, 2022

Highlights of Six Meituan Papers Accepted at CVPR 2022

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022 was held in New Orleans, USA. Meituan’s technical team had multiple papers accepted, covering model compression, video object segmentation, 3D visual grounding, image captioning, model security, and cross‑modal video retrieval.

Paper 01 | Compressing Models with Few Samples: Mimicking then Replacing

Paper Download

Authors: Wang Huanyu (Meituan intern & Nanjing University), Liu Junjie (Meituan), Ma Xin (Meituan), Yong Yang (Meituan intern & Xi’an Jiaotong University), Chai Zhenhua (Meituan), Wu Jianxin (Nanjing University)

Paper Type: CVPR Main Conference – Long Paper

The work proposes the MiR (Mimicking then Replacing) method, which transfers knowledge only from the penultimate layer, discarding the need for posterior distribution alignment used in traditional knowledge distillation. By grafting the original model’s classification/detection head onto the compressed model, rapid fine‑tuning with few samples is achieved. Experiments show significant improvements over baselines and validation on Meituan’s image security audit scenarios.

Paper 02 | Language‑Bridged Spatial‑Temporal Interaction for Referring Video Object Segmentation

Paper Download

Authors: Ding Zihan (Meituan), Hui Tianrui (University of Chinese Academy of Sciences), Huang Junshi (Meituan), Wei Xiaoming (Meituan), Han Jizhong (University of Chinese Academy of Sciences), Liu Si (Beihang University)

Paper Type: CVPR 2022 Main Conference – Poster

The authors introduce the LBDT (Language‑Bridged Dual‑Transfer) module, which uses language as an intermediate bridge to enable explicit and adaptive spatio‑temporal interaction early in the encoder. A bilateral channel activation (BCA) module is added in the decoder to denoise and highlight temporally consistent features. The method achieves state‑of‑the‑art performance on four public datasets without requiring pre‑training on image referring segmentation. Code: LBDT .

Paper 03 | 3D‑SPS: Single‑Stage 3D Visual Grounding via Referred Point Progressive Selection

Paper Download

Authors: Luo Junyu (Meituan intern & Beihang University), Fu Jiahui (Meituan intern & Beihang University), Kong Xianghao (Meituan), Gao Chen (Beihang University), Ren Haibing (Meituan), Shen Hao (Meituan), Xia Huaxia (Meituan), Liu Si (Beihang University)

Paper Type: CVPR 2022 Main Conference – Oral

The paper tackles 3D visual grounding by proposing a single‑stage approach (3D‑SPS) that progressively selects key points guided by language. It introduces a description‑aware key‑point sampling (DKS) module and a target‑oriented progressive relationship mining (TPM) module, eliminating the need for a separate detection‑matching pipeline and directly grounding objects in point clouds.

Paper 04 | DeeCap: Dynamic Early Exiting for Efficient Image Captioning

Paper Download

Authors: Fei Zhengcong (Meituan), Yan Xu (Institute of Computing Technology, CAS), Wang Shuhui (Institute of Computing Technology, CAS), Tian Qi (Huawei)

Paper Type: CVPR 2022 Main Conference – Poster

DeeCap introduces a dynamic early‑exit mechanism for Transformer‑based image captioning. By employing a mimicry learning module that predicts deep‑layer features from shallow ones, the model can exit early with minimal accuracy loss, achieving up to 4× speed‑up on MS‑COCO and Flickr30K while retaining competitive performance. Code: DeeCap .

Paper 05 | Boosting Black‑Box Attack with Partially Transferred Conditional Adversarial Distribution

Paper Download

Authors: Feng Yan (Meituan), Wu Baoyuan (The Chinese University of Hong Kong), Fan Yanbo (Tencent), Liu Li (The Chinese University of Hong Kong), Li Zhifeng (Tencent), Xia Shuta (Tsinghua University)

Paper Type: CVPR 2022 Main Conference – Poster

The study addresses black‑box model security by proposing a robust adversarial transfer mechanism that partially transfers conditional adversarial distribution parameters from a surrogate model while learning the remaining parameters from target model queries, mitigating surrogate bias and improving attack success.

Paper 06 | Semi‑supervised Video Paragraph Grounding with Contrastive Encoder

Paper Download

Authors: Jiang Xun (University of Electronic Science and Technology of China), Xu Xing (UESTC), Zhang Jingran (UESTC), Shen Fumin (UESTC), Cao Zuo (Meituan), Shen Hengtiao (UESTC)

Paper Type: CVPR Main Conference – Poster

The authors present a semi‑supervised VPG (Video Paragraph Grounding) framework that combines a Transformer‑based base model with a contrastive encoder to align video and paragraph texts at both coarse and fine granularity, reducing reliance on densely annotated timestamps while achieving SOTA performance.

These six papers demonstrate Meituan’s contributions to model compression, video segmentation, image captioning, adversarial robustness, and cross‑modal video retrieval, reflecting strong collaboration with academia and industry.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision model compression video segmentation adversarial attacks 3D grounding CVPR 2022

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.