Tagged articles
3 articles
Page 1 of 1
Data Thinking Notes
Data Thinking Notes
Oct 19, 2025 · Artificial Intelligence

How GSPO Improves Stability in Large Language Model Training

GSPO (Group Sequence Policy Optimization) is a reinforcement‑learning algorithm for LLMs that replaces token‑level GRPO with sequence‑level optimization, addressing instability in ultra‑large model training, especially for long‑sequence and MoE architectures, by aligning reward granularity and reducing variance.

GRPOGSPOLarge Language Models
0 likes · 11 min read
How GSPO Improves Stability in Large Language Model Training
DataFunSummit
DataFunSummit
Mar 3, 2022 · Artificial Intelligence

Sequence Optimization, Context-Aware CTR Re-Estimation, and Session-Level Auction for JD Advertising Ranking

The article presents JD's technical evolution for advertising ranking, covering technology selection for recommendation ad sorting, context‑aware CTR re‑estimation, reinforcement‑learning‑based sequence optimization, and a session‑level auction mechanism that together improve monetization efficiency and long‑term user value.

CTRReinforcement Learningauction
0 likes · 18 min read
Sequence Optimization, Context-Aware CTR Re-Estimation, and Session-Level Auction for JD Advertising Ranking
DataFunTalk
DataFunTalk
Feb 24, 2022 · Artificial Intelligence

Sequence Optimization and Context-Aware CTR Re-Estimation for JD Advertising Ranking

The article presents JD's technical evolution for advertising ranking, covering recommendation ad sorting, context‑aware CTR re‑estimation, reinforcement‑learning‑based sequence optimization, and session‑level auction mechanisms, and includes a Q&A that highlights practical gains and implementation challenges.

AdvertisingCTR predictionContext-Aware
0 likes · 14 min read
Sequence Optimization and Context-Aware CTR Re-Estimation for JD Advertising Ranking