Tag

DPO

1 views collected around this technical thread.

DaTaobao Tech
DaTaobao Tech
Jun 4, 2025 · Artificial Intelligence

Understanding Large Language Model Architecture, Parameters, Memory, Storage, and Fine‑Tuning Techniques

This article provides a comprehensive overview of large language models (LLMs), covering their transformer architecture, parameter counts, GPU memory and storage requirements, and detailed fine‑tuning methods such as prompt engineering, data construction, LoRA, PEFT, RLHF, and DPO, along with practical deployment and inference acceleration strategies.

DPOFine-tuningLLM
0 likes · 17 min read
Understanding Large Language Model Architecture, Parameters, Memory, Storage, and Fine‑Tuning Techniques
Data Thinking Notes
Data Thinking Notes
Mar 16, 2025 · Artificial Intelligence

Why DeepSeek R1 Swaps PPO for GRPO: A Deep Dive into RLHF Alternatives

DeepSeek‑R1 replaces the traditional PPO‑based RLHF approach with GRPO, reducing reliance on human‑labeled data by using pure reinforcement learning environments and carefully designed reward mechanisms; the article explains reinforcement learning fundamentals, compares PPO, DPO and GRPO, and offers practical application recommendations.

AI alignmentDPOGRPO
0 likes · 14 min read
Why DeepSeek R1 Swaps PPO for GRPO: A Deep Dive into RLHF Alternatives
JD Retail Technology
JD Retail Technology
Feb 28, 2025 · Artificial Intelligence

Generative Recommendation with DPO Alignment for JD Alliance Advertising: Multi‑Objective Optimization and Online Results

The paper presents a generative recommendation framework for JD Alliance advertising that combines semantic‑ID modeling, large‑model pre‑training and fine‑tuning, and Direct Preference Optimization (including Softmax‑DPO and β‑DPO) to jointly boost click‑through and conversion rates, achieving +0.6% UCTR and +8% UCVR in online tests while outlining future multi‑objective extensions.

DPOadvertisinggenerative recommendation
0 likes · 12 min read
Generative Recommendation with DPO Alignment for JD Alliance Advertising: Multi‑Objective Optimization and Online Results
Bilibili Tech
Bilibili Tech
Jan 14, 2025 · Artificial Intelligence

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.

AI alignmentAd Title GenerationBilibili
0 likes · 24 min read
Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili