Artificial Intelligence 7 min read

Can a Text‑to‑Image Model Replace Traditional Vision Tools? Nano Banana Pro Zero‑Shot Test

This article evaluates the Nano Banana Pro text‑to‑image model, built on Gemini 3 Pro, across fourteen low‑level vision tasks and forty datasets using only prompts without fine‑tuning, revealing strong perceptual quality but weak pixel‑level metrics, and highlighting both its generative strengths and failure modes such as hallucinations and color shifts.

PaperAgent

Dec 21, 2025

Can a Text‑to‑Image Model Replace Traditional Vision Tools? Nano Banana Pro Zero‑Shot Test

Background

Since 2023 text-to-image models have shown strong generative ability. Nano Banana Pro, built on Gemini 3 Pro, claims world knowledge and high‑precision generation without any fine‑tuning. The study asks whether a single natural‑language prompt can solve classic low‑level vision tasks such as dehazing, super‑resolution, deraining, shadow removal, deblurring, etc.

Experiment Design

Fourteen low‑level vision tasks covering 40 public datasets were evaluated in a zero‑shot setting. The model only receives a fixed prompt describing the desired operation; no gradient or task‑specific training is used. Prompt templates:

Image restoration : “Remove the haze/rain/shadow/blur/noise while keeping other elements unchanged.”

Image enhancement : “Upscale/low‑light enhance/underwater enhance/HDR this image.”

Image fusion : “Fuse the multi‑focus/IR‑visible images.”

Results Overview

Visually, Nano Banana Pro produces appealing outputs, but traditional pixel‑level metrics (PSNR, SSIM) are substantially lower than specialized models. Non‑reference perceptual scores (NIQE, NIMA) are often higher, reflecting realistic textures and low noise.

Dehazing : clear sky but often an artificial blue‑sky bias; NIMA 5.44 (highest among methods).

Super‑Resolution : strong texture hallucination and field‑of‑view expansion; PSNR drops >4 dB, NIQE 3.52 (best).

Deraining : visually cleaner images, but PSNR ≈21 dB on Rain200H, and occasional confusion between rain and fog.

Shadow removal : successful removal of hard shadows, yet occasional hand hallucination; PSNR 20.67 dB.

Motion deblur : text becomes readable, but faces may be swapped and color shifts appear; GoPro PSNR 21.41 dB.

Detailed Case Studies

1 Dehazing – “blue‑sky illusion”

Success on heavy haze (RTTS) where distant building details are recovered.

Failure on sunny scenes where the model forces a saturated blue sky.

2 Super‑Resolution – Field‑of‑View expansion

Low NIQE (3.52) and natural denoising.

Failure modes include unintended expansion of the scene beyond ground‑truth and severe text hallucination.

3 Deraining – Rain‑fog ambiguity

Preserves bridge‑cable structures and global semantics.

Sometimes treats fog as rain, leading to pixel‑level deviations.

4 Shadow removal – Unexpected hand

Hard shadows are removed with consistent tone.

Model may hallucinate an extra hand where the shadow was removed.

5 Motion deblur – Identity swap

Low‑light text becomes clear.

Faces can be swapped and color shifts introduced.

Core Conclusion

Generative models act as a “double‑edged sword” for low‑level vision:

Perceptual quality : realistic textures, low noise, high NIQE/NIMA, but pixel‑level drift reduces PSNR/SSIM.

Semantic consistency : globally reasonable structures, yet identity or text hallucinations are common.

Physical fidelity : no guarantee; color, scale, and illumination are often altered.

Zero‑shot generality : the same prompt works across all 14 tasks, but per‑task performance remains below that of dedicated models.

Thus Nano Banana Pro should be viewed as an image repaint engine rather than a conventional restoration model. Future work needs to constrain generative freedom to meet strict visual fidelity requirements.

Resources

https://arxiv.org/pdf/2512.15110
https://github.com/zplusdragon/LowLevelBanana

text-to-image Image Restoration AI model analysis low-level vision prompt-only zero-shot evaluation

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Experiment Design

Results Overview

Detailed Case Studies

1 Dehazing – “blue‑sky illusion”

2 Super‑Resolution – Field‑of‑View expansion

3 Deraining – Rain‑fog ambiguity

4 Shadow removal – Unexpected hand

5 Motion deblur – Identity swap

Core Conclusion

Resources

PaperAgent

How this landed with the community

Was this worth your time?

0 Comments

1 Dehazing – “blue‑sky illusion”

2 Super‑Resolution – Field‑of‑View expansion

3 Deraining – Rain‑fog ambiguity

4 Shadow removal – Unexpected hand

5 Motion deblur – Identity swap