Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning
This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.
In JD Retail's technology team, many post‑95 algorithm engineers have rapidly solved hard problems by focusing on difficult but correct tasks, which they argue is the fastest growth path for technologists.
Technical Challenge: Advertising Image Evaluation
Assessing whether an ad image meets quality standards is highly subjective; existing reward models often fail to guide AI adjustments precisely. The proposed solution replaces a single large reward model with a collection of specialized small reward models that evaluate specific aspects such as product shape, placement, and color matching, improving granularity and allowing flexible business rule integration.
The team built a trustworthy ad‑image generation framework based on human feedback, consisting of a training stage where generated images are judged by multiple small reward models and refined via reinforcement learning, and an inference stage where the same reward ensemble decides if an image can be launched without human review. This approach achieved a 98% usable image rate and a 30% recall improvement.
Technical Challenge: Query Expansion for E‑commerce Search
Traditional neural machine translation models struggle with novel user intents, leading to poor query expansion and low product recall. Inspired by large‑model capabilities and the InstructGPT paper, the team adopted a LLM‑plus‑PPO reinforcement learning pipeline.
The resulting query‑expansion framework includes three training phases: e‑commerce domain pre‑training, task‑driven fine‑tuning, and search‑engine‑based reinforcement learning. Offline simulations with multi‑granular reward functions boosted conversion rates in online experiments.
Technical Challenge: Model Pruning for Large Diffusion Models
To reduce the computational cost of large text‑to‑image models, the team applied Fast Fourier Transform (FFT) for frequency‑domain analysis to locate redundant components, then used the Ramer‑Douglas‑Peucker (RDP) algorithm to pinpoint critical points in the spectrum.
Combining FFT and RDP removed unnecessary transformer blocks, increasing training throughput by 40% without sacrificing performance.
def rdp(points, epsilon):
"""Ramer‑Douglas‑Peucker algorithm for curve simplification.
points: sequence of points on the curve
epsilon: tolerance, larger values yield more simplification
"""
def perpendicular_distance(pt, line_start, line_end):
if np.array_equal(line_start, line_end):
return np.linalg.norm(pt - line_start)
else:
return np.abs(np.cross(line_end - line_start, line_start - pt)) / np.linalg.norm(line_end - line_start)
def rdp_recursion(points, epsilon):
dmax = 0.0
index = 0
end = len(points)
for i in range(1, end - 1):
d = perpendicular_distance(points[i], points[0], points[-1])
if d > dmax:
index = i
dmax = d
if dmax > epsilon:
results1 = rdp_recursion(points[:index+1], epsilon)
results2 = rdp_recursion(points[index:], epsilon)
return results1[:-1] + results2
else:
return [points[0], points[-1]]
return rdp_recursion(points, epsilon)Technical Challenge: Agent‑Based Full‑Chain Evaluation
To move beyond supervised fine‑tuning, the team compared implicit reward (DPO‑style) and explicit reward (RLHF‑style) approaches, ultimately designing an Agent evaluation system that provides both local and end‑to‑end scores, distinguishing model reasoning errors from execution failures.
Growth Reflections
The engineers emphasize continuous reflection, case‑by‑case analysis, and staying updated with top‑conference papers and open‑source developments. They note that cross‑domain knowledge transfer—from image segmentation to generation—often sparks innovation.
Throughout the article, several illustrative images are included to visualize the described systems.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.