5 real ways to make money online in 2026

16 min read

How Causal Inference Meets Large Language Models to Revolutionize E‑commerce Pricing

This article presents a comprehensive approach that combines causal inference, large language models, and retrieval‑augmented generation to automate e‑commerce price recommendation, detailing the three‑step workflow, challenges across product categories, the RAG architecture, process‑reward‑guided tree search, reinforcement learning refinements, and experimental results showing significant accuracy and speed improvements.

JD Retail Technology

Jul 21, 2025

How Causal Inference Meets Large Language Models to Revolutionize E‑commerce Pricing

At QCon Global Software Development Conference organized by InfoQ, the author delivered a talk titled “Causal Inference and Large Model Fusion: Transformative Practices for E‑commerce Pricing”. The presentation explains how to use large‑model methods to tackle e‑commerce pricing challenges, optimize product pricing strategies, and improve decision‑making precision.

Introduction

With rapid e‑commerce growth and increasing price transparency, consumers compare multiple products before purchase. To emulate this behavior, an algorithm was designed that generates reasonable price suggestions for a target product based on similar‑product prices. The workflow consists of three steps:

Input the description of the product whose price needs verification.

Retrieve similar products and their prices from the database.

Provide a price suggestion for the target product along with the reasoning logic.

This capability is already applied to self‑operated new‑product price review, dramatically reducing manual audit costs.

LLM Modeling Method

During modeling, three main difficulties were encountered:

Full‑category coverage : Hundreds of categories have diverse price‑comparison logic, such as unit‑price conversion or material‑based price impact.

Complex product information : Gifts, bundles, and special models increase comparison difficulty.

Explainability : The pricing process must clearly show which similar items were referenced and why.

Large language models (LLMs) address these challenges because they possess rich domain knowledge, can handle varied category logic, understand complex product information, and provide explanations beyond traditional machine‑learning models.

The proposed Retrieval‑Augmented Generation (RAG) architecture defines the pricing pipeline as follows:

Retriever: Retrieves the most similar competitor products from the product pool using text similarity and embeddings, feeding them as prompts to the generator.

Generator: Uses an inference model to derive the target product price from similar‑product prices, enhancing accuracy and interpretability.

Reward design: Constructs rewards from three aspects – pricing error, price gap among similar items, and attribute extraction accuracy.

Process Reward and Tree Search Optimization

To improve the inference model, a process‑reward and tree‑search mechanism is introduced. The three‑step CoT (Chain‑of‑Thought) reasoning is:

Step 1: Convert unit prices to reduce gaps between similar items; reward is based on the coefficient of variation of similar‑item prices.

Step 2: Rank prices; reward reflects the difference between model‑predicted ranking and actual ranking.

Step 3: Compute the final price; reward is the deviation between estimated and actual price.

During training, a breadth‑first search (BFS) expands multiple candidate CoT paths. Process rewards are assigned at each step, and high‑quality candidates are retained for the next expansion, enabling efficient collection of valuable CoT samples.

Pre‑training and Reinforcement Learning

In the pre‑training stage, prompts are crafted to generate strict CoT templates, and supervised fine‑tuning (SFT) aligns the base model with the desired CoT format. Reinforcement learning uses PPO with a modified reward that accumulates process rewards across steps, ensuring early critical tokens receive sufficient signal.

Experimental Results

Accuracy: Traditional deep‑learning models customized for a few categories achieved only 44% accuracy on a random sample of common three‑level categories. The proposed method raised overall accuracy to 74%.

Speed: While prompt‑engineering with high‑quality inference models can achieve similar results, inference typically exceeds 10 minutes and may encounter dead‑loops. The 7B open‑source base model trained with the proposed pipeline performs inference in seconds on a single GPU.

Future Optimizations

Current similarity‑retrieval and LLM training are decoupled, limiting feedback from final pricing outcomes. Introducing adversarial learning can automatically select challenging main products for training, extending the explore‑exploit strategy to the retrieval stage.

End‑to‑end RAG + LLM joint training, process‑reward‑guided tree search, and BFS‑based path selection are further explored to enhance pricing performance.

<span>step <span>1</span>：计算单位价格</span>
<span>好的,先将所有参考商品的总价统一折算到“元/斤”：</span>
<span>...</span>
<span>{</span>
<span>"unit": <span>"斤"</span>,</span>
<span>"unit_count": {"B7": 150, "B1": 500, ...}</span>
<span>}</span>
<span>step <span>2</span>：计算排序</span>
<span>我现在需要处理用户的问题,帮助他们将商品A插入到集合C中,并确保单位价格从高到低排序。首先,我得仔细阅读用户提供的任务要求和输入信息,确保自己理解正确。</span>
<span>...</span>
<span>{</span>
<span>"order": ["B7", "B1", ...]</span>
<span>}</span>
<span>step <span>3</span>：计算价格</span>
<span>假设A的单位价格预估为0.0450元/克,那么它应该插入在B4(0.04453)之后,B6(0.03993)之前。</span>
<span>...</span>
<span>{</span>
<span>"price": 0.04</span>
<span>}</span>

chain of thought reinforcement learning causal inference e‑commerce pricing retrieval-augmented generation

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.