Tagged articles
13 articles
Page 1 of 1
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

Understanding Preference Alignment: Why Voice Output Needs an Extra Layer

The article explains that after task alignment, teams can produce functional demos, but true competitiveness requires preference alignment—optimizing for human comfort across dimensions like brevity, tone, and safety—and discusses how RLHF and DPO address this, especially the additional challenges of generating natural, responsive voice output.

AI AlignmentDPOHuman Feedback
0 likes · 7 min read
Understanding Preference Alignment: Why Voice Output Needs an Extra Layer
JD Tech
JD Tech
Jan 27, 2026 · Artificial Intelligence

How Uni-Layout Unifies Cross‑Task Layout Generation with Human‑Like Evaluation

Uni-Layout introduces a unified framework that integrates a universal layout generator, a human‑feedback‑simulating evaluator, and a dynamic margin preference optimization technique to align generation and evaluation across diverse e‑commerce design tasks, backed by a new 100k human‑annotated dataset.

Human FeedbackMultimodal LLMdynamic margin optimization
0 likes · 11 min read
How Uni-Layout Unifies Cross‑Task Layout Generation with Human‑Like Evaluation
JD Cloud Developers
JD Cloud Developers
Jan 15, 2026 · Artificial Intelligence

Uni-Layout: Unifying Layout Generation with Human Feedback and Dynamic Alignment

Uni-Layout introduces a unified framework that combines a multimodal large language model‑based generator, a human‑like evaluator trained on the large Layout‑HF100k dataset, and a Dynamic Margin Preference Optimization (DMPO) method to align generation and evaluation, achieving state‑of‑the‑art results across diverse layout tasks.

DMPOHuman FeedbackMultimodal LLM
0 likes · 11 min read
Uni-Layout: Unifying Layout Generation with Human Feedback and Dynamic Alignment
JD Tech Talk
JD Tech Talk
Jan 15, 2026 · Artificial Intelligence

Uni-Layout: Harnessing Human Feedback for Unified Layout Generation and Evaluation

Uni-Layout introduces a unified framework that generates layouts across diverse tasks, simulates human evaluation with a novel feedback dataset, and aligns generation and assessment through dynamic margin preference optimization, achieving state‑of‑the‑art performance on multiple benchmarks.

AI designHuman FeedbackMultimodal LLM
0 likes · 11 min read
Uni-Layout: Harnessing Human Feedback for Unified Layout Generation and Evaluation
JD Retail Technology
JD Retail Technology
Jan 8, 2026 · Artificial Intelligence

Uni-Layout: Unified Cross-Task Layout Generation with Human-Aligned Evaluation

Uni-Layout introduces a unified layout generation framework that consolidates diverse design tasks, leverages multimodal large language models for flexible generation, and aligns outputs with human perception through a novel human‑feedback dataset (Layout‑HF100k) and a dynamic margin preference optimization (DMPO) evaluator.

ACM MultimediaHuman FeedbackMultimodal LLM
0 likes · 11 min read
Uni-Layout: Unified Cross-Task Layout Generation with Human-Aligned Evaluation
Kuaishou Tech
Kuaishou Tech
Nov 24, 2025 · Artificial Intelligence

How Human Feedback Supercharges Video Generation – The VideoAlign Pipeline Explained

This article details a new research pipeline that leverages large‑scale human preference data, a multi‑dimensional video reward model, and specialized alignment algorithms to dramatically improve video generation quality, motion fidelity, and text‑video consistency, with open‑source code and benchmarks for reproducibility.

AI AlignmentHuman FeedbackRLHF
0 likes · 10 min read
How Human Feedback Supercharges Video Generation – The VideoAlign Pipeline Explained
AI Algorithm Path
AI Algorithm Path
Jul 27, 2025 · Artificial Intelligence

Understanding RLHF: How Human Feedback Trains Modern LLMs

This article explains the RLHF (Reinforcement Learning from Human Feedback) pipeline that powers ChatGPT and other large language models, covering the limitations of traditional fine‑tuning, the creation of human‑feedback datasets, reward‑model training, loss design, and the final PPO‑based fine‑tuning step.

ChatGPTHuman FeedbackLarge Language Models
0 likes · 8 min read
Understanding RLHF: How Human Feedback Trains Modern LLMs
Bilibili Tech
Bilibili Tech
May 20, 2025 · Artificial Intelligence

How AnimeReward and GAPO Transform Anime Video Generation with Human Feedback

Researchers at Bilibili present Index‑Anisora, an open‑source anime video generation framework that builds a 30k‑sample reward dataset, introduces the multi‑dimensional AnimeReward model and a Gap‑Aware Preference Optimization (GAPO) method, and demonstrate through extensive automatic and human evaluations that their approach significantly outperforms baseline video generators.

AIAlignmentGAPO
0 likes · 20 min read
How AnimeReward and GAPO Transform Anime Video Generation with Human Feedback
AntTech
AntTech
Jan 13, 2025 · Artificial Intelligence

Two Ant Group Papers Selected for AAAI 2025: Human‑Feedback Evaluation Framework for Product Image Background Inpainting and Bagging‑Expert Network for Multi‑Task Learning

Two Ant Group papers accepted at AAAI 2025—one presenting a human‑feedback‑driven evaluation framework for product image background inpainting using EfficientSAM and a new HFPC‑44k dataset, and the other proposing a Bagging‑Expert Network to mitigate expert polarization in multi‑gate mixture‑of‑experts for multi‑task learning.

AAAI 2025Ant GroupBagging-Expert Network
0 likes · 4 min read
Two Ant Group Papers Selected for AAAI 2025: Human‑Feedback Evaluation Framework for Product Image Background Inpainting and Bagging‑Expert Network for Multi‑Task Learning
JD Tech Talk
JD Tech Talk
Nov 14, 2024 · Artificial Intelligence

Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet

This paper presents a multimodal Reliable Feedback Network (RFNet) and a consistency regularization method that use human feedback to automatically evaluate and fine‑tune diffusion models, dramatically increasing the usable rate of e‑commerce advertising images while preserving visual quality.

Computer VisionDiffusion ModelsHuman Feedback
0 likes · 8 min read
Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet
JD Cloud Developers
JD Cloud Developers
Nov 14, 2024 · Artificial Intelligence

Boosting Advertising Image Generation Reliability with Human Feedback

This paper presents a multimodal Trustworthy Feedback Network (RFNet) and a consistency regularization method that use human feedback to dramatically improve the usability and visual quality of automatically generated e‑commerce advertising images while reducing manual inspection costs.

AIDiffusion ModelsHuman Feedback
0 likes · 9 min read
Boosting Advertising Image Generation Reliability with Human Feedback
Python Crawling & Data Mining
Python Crawling & Data Mining
Aug 20, 2023 · Artificial Intelligence

What Is RLHF? Benefits, Limits, and Design Tips for Human‑Feedback Reinforcement Learning

This article explains Reinforcement Learning with Human Feedback (RLHF), outlining its definition, suitable tasks, advantages over other reward‑model methods, types of algorithms, challenges of human feedback, and practical strategies to mitigate its limitations for building robust AI systems.

AI AlignmentHuman FeedbackReinforcement Learning
0 likes · 14 min read
What Is RLHF? Benefits, Limits, and Design Tips for Human‑Feedback Reinforcement Learning