Artificial Intelligence 9 min read

Boosting Advertising Image Generation Reliability with Human Feedback

This paper presents a multimodal Trustworthy Feedback Network (RFNet) and a consistency regularization method that use human feedback to dramatically improve the usability and visual quality of automatically generated e‑commerce advertising images while reducing manual inspection costs.

JD Cloud Developers

Nov 14, 2024

Boosting Advertising Image Generation Reliability with Human Feedback

Background and Current Situation

Attractive advertising images are crucial for e‑commerce success, but manual design is costly, prompting interest in automatic generation using diffusion models such as Stable Diffusion combined with ControlNet. Existing generators often produce defective images—mis‑aligned, low‑visibility, or with shape hallucinations—that mislead customers and require extensive human review.

Trustworthy Feedback Model (RFNet)

To replace manual inspection, we propose a novel Trustworthy Feedback Network (RFNet) that acts as an automated reviewer. RFNet integrates multiple auxiliary modalities to assess the usability of generated advertising images, addressing the lack of product‑specific knowledge in single‑image evaluations. Its architecture is shown below.

By feeding generated images to RFNet, we can iteratively sample until a usable image is found—a process we call cyclic generation. Pseudocode for this loop is illustrated in the following figure.

Trustworthy Human Feedback

Because cyclic generation can be time‑consuming, we incorporate human feedback (RLHF) to fine‑tune the diffusion model. After training RFNet, its output serves as a proxy for human evaluation; gradients are back‑propagated to the generator to increase the proportion of usable images without sacrificing visual quality.

The gradient from the feedback is expressed as a loss that encourages the generator to produce images with higher probability of being classified as usable, while only updating the ControlNet branch.

Loss Functions and Conditional Consistency

Directly optimizing for usability can conflict with aesthetic quality, leading to “visual collapse” when the model over‑optimizes for usable cases. To mitigate this, we add a KL‑divergence constraint that keeps the fine‑tuned model’s output distribution close to the original. Additionally, we propose a conditional consistency loss (L CC ) that preserves the text condition while allowing the image to become more usable.

Experimental Results

(1) Advertising Image Review Performance – Table 1 shows RFNet outperforming baselines on all metrics, confirming the benefit of multimodal integration. Component ablation (Table 2) demonstrates that each part of RFNet contributes significantly to average precision.

(2) Trustworthy Performance – Table 3 indicates that our RFFT method achieves higher usability rates than competing approaches. The “Ava” and “Human Ava” trends validate RFNet’s alignment with human judgments. Cyclic generation (RG) markedly raises usable image ratios while reducing production time.

Qualitative evaluation shows that our method maintains aesthetic quality comparable to the original model, thanks to the conditional consistency constraint.

(3) Qualitative Comparison – Visual examples illustrate the increased usability and production efficiency while preserving stable visual performance.

(4) Generalization – After fine‑tuning, ControlNet shows strong adaptability when combined with various LoRA and diffusion model weights, significantly improving usability across different configurations (Table 4).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Reliability image generation diffusion models Human Feedback

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.