Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)
The paper introduces a Multimodal Reliable Feedback Network (RFNet) and a consistency‑condition regularization technique that together boost the usable rate of automatically generated advertisement images while preserving visual quality, supported by a new million‑image annotated dataset and extensive ECCV‑2024 experiments.
In e‑commerce, attractive advertisement images are crucial, but generated images often fail to meet advertising standards, leading to costly manual review.
This work, accepted at ECCV 2024, proposes a Multimodal Reliable Feedback Network (RFNet) that automatically evaluates generated ad images and integrates its feedback into a cyclic generation process, markedly increasing the proportion of usable images without sacrificing visual appeal.
RFNet fuses multiple auxiliary modalities (e.g., product semantics, background context) to assess image suitability; its output is used to fine‑tune a diffusion model via a novel Reliable Feedback Fine‑Tuning (RFFT) method that employs a consistency‑condition loss (L_CC) to keep text‑condition gradients stable while steering the model toward higher usability.
To train RFNet, the authors constructed the RF1M dataset, containing over one million human‑annotated generated advertisement images, providing reliable feedback for model supervision.
Extensive experiments demonstrate that RFNet outperforms baselines on all evaluation metrics, and RFFT achieves higher usable‑image rates while maintaining aesthetic quality, reducing the number of generation attempts and overall production time.
The fine‑tuned ControlNet also shows strong generalization when combined with various LoRA and diffusion model weights, further confirming the robustness of the proposed approach.
Paper: https://arxiv.org/abs/2408.00418 Code: https://github.com/ZhenbangDu/Reliable_AD
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.