Artificial Intelligence 9 min read

Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation

The paper proposes a multimodal Reliable Feedback Network (RFNet) and a consistency‑regularized fine‑tuning method (RFFT) that dramatically increase the proportion of usable advertising images generated by diffusion models while preserving visual appeal, and introduces the large‑scale RF1M dataset for training and evaluation.

JD Tech
JD Tech
JD Tech
Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation

Abstract In e‑commerce, attractive advertising images are crucial but automatically generated images often fail to meet advertising standards, requiring costly manual review. This work introduces a multimodal Reliable Feedback Network (RFNet) that simulates human reviewers, integrates its feedback into a cyclic generation loop, and applies a consistency‑regularized fine‑tuning (RFFT) to diffusion models, achieving a significant boost in usable image rate without sacrificing visual quality. A new RF1M dataset containing over one million human‑annotated generated ad images is also released.

Background and Motivation Manual design of ad images is labor‑intensive, prompting interest in generative models such as Stable Diffusion combined with ControlNet. However, generated images frequently exhibit spatial mismatches, low relevance, or hallucinated shapes, leading to poor user experience and high manual verification costs.

Reliable Feedback Model (RFNet) RFNet acts as an automated human reviewer by jointly processing image, text, and layout modalities to assess image usability. Its architecture incorporates multiple auxiliary modalities to capture critical cues unavailable from a single image. By feeding RFNet’s judgments back into the generation loop (cyclic generation), the system repeatedly samples until a usable image is produced.

Human‑Feedback‑Guided Fine‑Tuning (RFFT) To reduce the number of generation attempts, the authors fine‑tune the diffusion model using gradients derived from RFNet’s output, analogous to RLHF. A consistency constraint (L CC ) is introduced to preserve the textual condition while encouraging higher usability, avoiding the trade‑off between usability and aesthetics observed with naïve KL‑based regularization.

Experiments Extensive evaluations on the RF1M dataset show that RFNet outperforms baselines on all metrics (AP, recall, etc.). The RFFT method achieves higher usable‑image rates than competing approaches, with comparable aesthetic scores thanks to the consistency constraint. Ablation studies confirm the importance of each RFNet component.

Generalization The fine‑tuned ControlNet demonstrates strong transferability when combined with various LoRA adapters and diffusion model checkpoints, consistently improving usable‑image rates across different model configurations.

Conclusion By integrating a multimodal feedback network and a consistency‑preserving fine‑tuning strategy, the proposed system offers a reliable and efficient pipeline for advertising image generation, reducing manual effort while maintaining visual quality.

image generationdiffusion modelsgenerative AIadvertising imagesmultimodal feedbackRFNet
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.