Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning

This article recounts how JD Retail's young algorithm engineers tackled diverse AI problems—optimizing reward‑model ensembles for ad image generation, building large‑language‑model‑based query expansion, and pruning diffusion models with FFT and RDP—while sharing their technical approaches, code snippets, and growth reflections.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning

JD Retail’s technology team, composed largely of post‑95 algorithm engineers, demonstrates rapid growth by confronting hard AI problems such as evaluating advertising images, expanding user search queries, and reducing the size of large diffusion models.

Technical challenge 1: Determining whether an ad image meets quality standards is highly subjective, and existing reward models cannot guide AI to precise improvements. The proposed solution combines multiple small, specialized reward models—each focusing on aspects like shape, placement, or color—to replace a single large model, improving granularity and allowing flexible business rule integration.

The team built a training‑inference framework where the generator creates ad images, the ensemble of reward models provides multidimensional signals, and reinforcement learning fine‑tunes the generator. This pipeline achieved a 98% usable image rate and a 30% recall increase.

Technical challenge 2: Traditional query‑expansion models struggle with novel user intents, leading to poor product recall. The engineers adopted a large‑language‑model (LLM) approach enhanced with reinforcement learning from human feedback (RLHF) to create a three‑stage training pipeline: e‑commerce pre‑training, task‑specific fine‑tuning, and RL‑based alignment with a search‑engine simulator. The result was a significant boost in conversion rates.

Technical challenge 3: High‑capacity text‑to‑image models consume excessive compute in e‑commerce settings. By applying frequency‑domain analysis (FFT) to detect redundant components and the Ramer‑Douglas‑Peucker (RDP) algorithm to locate critical points in the spectrum, the team pruned unnecessary blocks, increasing training throughput by 40% without harming performance.

Key code snippets illustrating these methods are shown below:

<span><span>def</span> <span>rdp</span>(<span>points, epsilon</span>):</span>
<span>    <span>"""</span></span>
<span>    Ramer-Douglas-Peucker algorithm for curve simplification.</span>
<span>    points: sequence of points on the curve</span>
<span>    epsilon: tolerance, larger values simplify more</span>
<span>    """</span>
<span>    <span>def</span> <span>perpendicular_distance</span>(<span>pt, line_start, line_end</span>):</span>
<span>        <span># compute distance from pt to line segment</span></span>
<span>        <span>if</span> np.array_equal(line_start, line_end):</span>
<span>            <span>return</span> np.linalg.norm(pt - line_start)</span>
<span>        <span>else</span>:</span>
<span>            <span>return</span> np.<span>abs</span>(np.cross(line_end - line_start, line_start - pt)) / np.linalg.norm(line_end - line_start)</span>
<span>    <span>def</span> <span>rdp_recursion</span>(<span>points, epsilon</span>):</span>
<span>        <span># recursive RDP, find farthest point</span></span>
<span>        <span>...</span></span>
<span>    <span>return</span> rdp_recursion(points, epsilon)</span>
<span><span>def</span> <span>get_token_prob</span>(<span>prompt, target_token</span>):</span>
<span>    <span># encode input and locate prediction position</span></span>
<span>    inputs = tokenizer(prompt, return_tensors=<span>"pt"</span>)</span>
<span>    input_ids = inputs.input_ids</span>
<span>    target_len = <span>len</span>(tokenizer.encode(target_token, add_special_tokens=<span>False</span>))</span>
<span>    <span># obtain model logits</span></span>
<span>    <span>with</span> torch.no_grad():</span>
<span>        outputs = model(**inputs)</span>
<span>    next_token_logits = outputs.logits[:, -<span>1</span>, :]</span>
<span>    <span># convert to probability distribution</span></span>
<span>    probs = F.softmax(next_token_logits, dim=-<span>1</span>)</span>
<span>    <span># get probability of target token</span></span>
<span>    target_ids = tokenizer.encode(target_token, add_special_tokens=<span>False</span>)</span>
<span>    <span>return</span> probs[<span>0</span>, target_ids[<span>0</span>]].item()</span>

Across all projects, the engineers emphasize systematic problem framing, iterative experimentation, continuous learning from top‑conference papers, and building reusable methodologies that accelerate both personal and team growth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIalgorithm engineeringModel Pruning
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.