Building a Low‑Cost, Privacy‑Safe Logistics QA Bot with Hybrid Retrieval & LLM

This article describes a privacy‑preserving, low‑cost logistics QA bot that combines data cleaning, augmentation, BM25 and vector retrieval, a DNN rerank model, and LLM‑based answer rewriting, along with evaluation results and deployment considerations.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
Building a Low‑Cost, Privacy‑Safe Logistics QA Bot with Hybrid Retrieval & LLM

1. Business Background

In the logistics private‑domain system, many WeChat groups require an automatic reply robot that can answer user questions with minimal cost while ensuring data privacy, answer accuracy, and avoiding large‑model hallucinations.

2. Technical Solution

2.1 Project Overview

About 200+ QA pairs are provided. The pipeline consists of recall, re‑ranking, and answer rewriting.

2.2 Data Cleaning

The original Excel data are irregular; they are transformed into {"query": "...", "answer": "..."} pairs, which facilitates constructing positive and negative samples for a DNN model.

2.3 Data Augmentation

To enlarge the dataset, a large language model rewrites queries based on a prompt template. The number of generated QA pairs is proportional to the answer token count divided by 40.

zh_prompt_template = """如下三个反引号中是{product}的相关知识信息, 请基于这部分知识信息自动生成{question_num}个问题以及对应答案
```{knowledge}```
要求尽可能详细全面, 并且遵循如下规则:
1. 生成的内容不要超出反引号中信息的范围
2. 问题部分需要以"Question:"开始
3. 答案部分需要以"Answer:"开始
"""

2.4 Model Training

The retrieval stage combines BM25 inverted index and vector similarity (BERT‑based embeddings). The re‑ranking stage uses a DNN rerank model that scores candidate QA pairs. The top answer is fed to a large language model for final rewriting.

BM25 provides fast, interpretable keyword matching, while vector retrieval captures semantic similarity. The two methods complement each other.

Vector retrieval uses BERT‑derived embeddings; fine‑tuning is avoided due to GPU constraints and compliance concerns.

The re‑ranking model processes multi‑way recall results in a single sequence, increasing parameter count but improving ranking for the small candidate set.

2.5 Output Stage

To avoid “answer‑not‑matching” or stiff language, the selected QA pair and the user query are sent to an LLM for second‑stage rewriting. A refusal threshold is set so that when the highest similarity is below the threshold, the system returns a “no answer” response.

2.6 Evaluation

Demo results show that fine‑tuning improves the separation between positive and negative scores for both query‑to‑answer (qa) and query‑to‑query (qq) retrieval.

3. Online Deployment

The pipeline cannot be packaged as a single TorchScript or ONNX model, but it can be containerized and deployed via the internal platform.

4. References

Solution draws on Amazon’s public algorithmic designs and the SBERT paper (https://arxiv.org/pdf/1908.10084).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data augmentationprivacyLogisticsHybrid RetrievalQA botLLM rewriting
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.