Artificial Intelligence 15 min read

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

JD Tech

May 26, 2025

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

In JD Retail's technology team, many post‑95 algorithm engineers have rapidly solved hard problems by focusing on difficult but correct tasks, which they argue is the fastest growth path for technologists.

Technical Challenge: Advertising Image Evaluation

Assessing whether an ad image meets quality standards is highly subjective; existing reward models often fail to guide AI adjustments precisely. The proposed solution replaces a single large reward model with a collection of specialized small reward models that evaluate specific aspects such as product shape, placement, and color matching, improving granularity and allowing flexible business rule integration.

The team built a trustworthy ad‑image generation framework based on human feedback, consisting of a training stage where generated images are judged by multiple small reward models and refined via reinforcement learning, and an inference stage where the same reward ensemble decides if an image can be launched without human review. This approach achieved a 98% usable image rate and a 30% recall improvement.

Technical Challenge: Query Expansion for E‑commerce Search

Traditional neural machine translation models struggle with novel user intents, leading to poor query expansion and low product recall. Inspired by large‑model capabilities and the InstructGPT paper, the team adopted a LLM‑plus‑PPO reinforcement learning pipeline.

The resulting query‑expansion framework includes three training phases: e‑commerce domain pre‑training, task‑driven fine‑tuning, and search‑engine‑based reinforcement learning. Offline simulations with multi‑granular reward functions boosted conversion rates in online experiments.

Technical Challenge: Model Pruning for Large Diffusion Models

To reduce the computational cost of large text‑to‑image models, the team applied Fast Fourier Transform (FFT) for frequency‑domain analysis to locate redundant components, then used the Ramer‑Douglas‑Peucker (RDP) algorithm to pinpoint critical points in the spectrum.

Combining FFT and RDP removed unnecessary transformer blocks, increasing training throughput by 40% without sacrificing performance.

def rdp(points, epsilon):
    """Ramer‑Douglas‑Peucker algorithm for curve simplification.
    points: sequence of points on the curve
    epsilon: tolerance, larger values yield more simplification
    """
    def perpendicular_distance(pt, line_start, line_end):
        if np.array_equal(line_start, line_end):
            return np.linalg.norm(pt - line_start)
        else:
            return np.abs(np.cross(line_end - line_start, line_start - pt)) / np.linalg.norm(line_end - line_start)
    def rdp_recursion(points, epsilon):
        dmax = 0.0
        index = 0
        end = len(points)
        for i in range(1, end - 1):
            d = perpendicular_distance(points[i], points[0], points[-1])
            if d > dmax:
                index = i
                dmax = d
        if dmax > epsilon:
            results1 = rdp_recursion(points[:index+1], epsilon)
            results2 = rdp_recursion(points[index:], epsilon)
            return results1[:-1] + results2
        else:
            return [points[0], points[-1]]
    return rdp_recursion(points, epsilon)

Technical Challenge: Agent‑Based Full‑Chain Evaluation

To move beyond supervised fine‑tuning, the team compared implicit reward (DPO‑style) and explicit reward (RLHF‑style) approaches, ultimately designing an Agent evaluation system that provides both local and end‑to‑end scores, distinguishing model reasoning errors from execution failures.

Growth Reflections

The engineers emphasize continuous reflection, case‑by‑case analysis, and staying updated with top‑conference papers and open‑source developments. They note that cross‑domain knowledge transfer—from image segmentation to generation—often sparks innovation.

Throughout the article, several illustrative images are included to visualize the described systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision Model Optimization AI Large Language Models reinforcement learning Query Expansion

Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.