A Survey of Image Quality Evaluation Metrics for Text-to-Image Generation
The survey traces the evolution of image‑quality evaluation for text‑to‑image generation—from early handcrafted edge and color cues, through GAN‑era similarity scores such as IS, FID and KID, to modern perceptual and CLIP‑based metrics like LPIPS, CLIPScore, TRIQ, IQT and human‑preference models—highlighting a shift toward semantic, aesthetic, and text‑image alignment measures and forecasting domain‑specific metrics for future diffusion models.