How UPRE Achieves Zero-Shot Domain Adaptation for Object Detection with Unified Prompts

The UPRE paper, presented at ICCV, introduces a multi‑view domain prompt and a unified representation enhancement to enable zero‑shot domain adaptation for object detection, achieving state‑of‑the‑art performance across diverse weather, geographic, and synthetic‑to‑real scenarios.

Amap Tech
Amap Tech
Amap Tech
How UPRE Achieves Zero-Shot Domain Adaptation for Object Detection with Unified Prompts

ICCV (International Conference on Computer Vision) is a top‑tier computer‑vision conference; this year it will be held in Hawaii, USA. Among the 11,239 submissions, five papers from the Gaode technology team were accepted, including the work described below.

Paper Title: UPRE: Zero‑Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement

Paper Link: https://arxiv.org/pdf/2507.00721

UPRE jointly optimizes textual prompts and visual representations by designing a Multi‑view Domain Prompt (MDP) and a Unified Representation Enhancement (URE) module. MDP supplies language‑modal priors for the target domain and captures diverse adaptation knowledge, while URE generates target‑domain representations from source data to reduce domain bias. Two enhancement strategies—Relative Domain Distance (RDD) and Positive‑Negative Separation (PNS)—form a multi‑level training framework that markedly improves adaptation and detection performance. Experiments show UPRE excels on three domain‑adaptation tasks, significantly boosting the detector’s ability on unseen domains.

Research Background

Domain‑adaptation methods have attracted attention for improving model generalization, but acquiring even unlabeled image priors is difficult in practice, limiting their use. Zero‑Shot Domain Adaptation (ZSDA) aims to adapt to a target domain without any image priors, addressing the reliance on target data in conventional methods. With the rise of vision‑language models (VLMs), ZSDA has progressed by using textual prompts to describe unseen target domains, leveraging their inherent zero‑shot capability.

Existing VLM‑based approaches face two challenges: domain bias (distribution shift between source and target) and detection bias (loss of instance‑level details due to emphasis on global image representation). Manually crafted prompts often fail to capture contextual attributes of foreground and background objects.

Paper Highlights

Multi‑view Domain Prompt (MDP) : Combines static and learnable dynamic prompts to provide language‑modal priors and capture diverse adaptation knowledge, preserving human‑defined prompt completeness while focusing on cross‑domain knowledge and object localization.

Unified Representation Enhancement (URE) : Generates target‑domain representations across visual and language modalities, mitigating domain bias. It introduces learnable mean and bias enhancement modules that fine‑tune source image features to produce pseudo‑target features, improving adaptability to varying style changes.

Multi‑level Enhancement Strategies : Includes Relative Domain Distance (RDD) at the image level and Positive‑Negative Separation (PNS) at the instance level. RDD balances semantic integrity and style diversity via representation regularization, while PNS narrows the search space for foreground objects and better distinguishes background, enhancing detection performance.

Experimental Results

Qualitative Analysis & Generalization

t‑SNE visualizations of image embeddings across five domains show that CLIP provides coarse generalization, whereas UPRE achieves superior adaptation for each target domain.

Main Comparison Experiments

We evaluated UPRE on nine open‑source datasets across three challenging domain‑adaptation scenarios: adverse weather conditions, cross‑city geographic shifts, and synthetic‑to‑real transfer. Results (tables and figures) demonstrate that UPRE consistently outperforms baselines such as CLIP‑GAP and OA‑DG in both quantitative metrics and visual detection quality.

Conclusion and Outlook

UPRE demonstrates strong potential for enhancing domain adaptation and detection in complex environments. Future work should focus on improving robustness across more diverse real‑world scenarios, efficiently leveraging cross‑domain data, and integrating UPRE with other machine‑learning techniques to broaden its applicability.

computer visionobject detectionprompt engineeringrepresentation learningzero-shot domain adaptation
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.