Boost Black-Box VLMs Without Training: Class-Aware Prompt Reweighting (CARPRT)

The article analyzes the prompt‑sensitivity problem of zero‑shot classification in vision‑language models, critiques class‑agnostic prompt weighting, and presents CARPRT—a training‑free, black‑box compatible method that reweights prompts per class using similarity scores and pseudo‑labels, achieving consistent gains across datasets and model architectures.

Machine Heart
Machine Heart
Machine Heart
Boost Black-Box VLMs Without Training: Class-Aware Prompt Reweighting (CARPRT)

Background

Vision‑Language Models (VLMs) such as CLIP have made zero‑shot image classification possible by matching image embeddings with textual prompts. However, classification performance is highly sensitive to the wording of prompts, and existing prompt‑ensemble methods (e.g., Mean Prompt Ensembling, Weighted Prompt Ensembling) use a single, class‑agnostic weight vector that assumes all prompts are equally important for every category.

Limitations of Class‑Agnostic Weighting

Empirical observations (Fig. 1 in the original paper) show that optimal prompt‑weight distributions differ markedly across categories. The class‑agnostic design introduces two drawbacks: it ignores semantic mismatches between prompts and specific classes, leading to systematic bias, and it relies on manually crafted prompts that may not generalize to new datasets or tasks.

CARPRT: Class‑Aware Prompt Reweighting

CARPRT addresses these issues without any training or access to model parameters. It treats zero‑shot classification as a conditional probability estimation problem and derives a Bayesian formulation where the class‑specific prompt weights appear as posterior distributions. Pseudo‑labels are generated by selecting the highest‑scoring class for each (image, prompt) pair, and these pseudo‑labels are aggregated to compute the average similarity of each prompt for each class. The resulting normalized values constitute the class‑aware weight vector.

Two‑Step Inference Procedure

Score Calculation : For every image, prompt, and class combination, the VLM’s forward pass yields a similarity score (image × prompt × class → score). This builds a complete semantic association space using only the model’s inference API.

Weight Calculation : Using the similarity scores, CARPRT creates pseudo‑labels, aggregates them per class, and normalizes the average prompt similarities to obtain class‑specific weights. During inference, these weights re‑weight the predictions of each prompt before the final class decision.

Experimental Validation

Extensive evaluations on multiple zero‑shot benchmarks (including fine‑grained datasets) and across different VLM architectures (CLIP ViT‑B/16, ResNet‑50, DeCLIP) show that CARPRT consistently outperforms MPE, Majority Vote, and WPE. The gains are attributed to the more appropriate modeling of prompt‑class relationships rather than any model‑specific advantage.

An ablation where class‑specific weights are averaged to a global weight (CARPRT‑Uniform) results in a significant performance drop, confirming that the class‑aware component is the key driver of improvement.

Implications and Generality

Because CARPRT requires only similarity scores, it works with both open‑source and closed‑source (black‑box) VLMs and can be inserted as a plug‑and‑play module in existing pipelines, including test‑time adaptation, prompt‑tuning, or data‑augmentation workflows. The method demonstrates that, in the era of large, immutable models, performance gains can stem from refined problem modeling rather than larger models or additional training data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Vision-Language ModelsBlack-Box OptimizationZero-Shot ClassificationClass-Aware ModelingPrompt Reweighting
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.