What’s New in qa_match V1.1? Lightweight Pre‑trained Model and One‑Level KB Support
The article introduces qa_match V1.1, an open‑source deep‑learning QA matching tool that adds one‑level knowledge‑base support, releases a lightweight Bi‑LSTM pre‑trained language model (SPTM), and provides detailed architecture, training data, performance benchmarks, future plans, and contribution guidelines.
Overview
qa_match is an open‑source lightweight question‑answer matching tool (Apache License 2.0) hosted at https://github.com/wuba/qa_match. Version 1.0 was released on 2020‑03‑09; version 1.1 adds support for a one‑level knowledge base and releases a Bi‑LSTM‑based pre‑trained language model (SPTM).
Knowledge‑Base Types
A one‑level knowledge base consists of standard questions (intents) and their expanded variations. A two‑level knowledge base adds a domain layer that groups multiple intents.
Simple Pre‑trained Model (SPTM)
SPTM replaces the Transformer with a residual Bi‑LSTM network for faster inference and removes the Next Sentence Prediction task. Pre‑training follows a BERT‑style masking strategy: 15 % of characters are selected, of which 80 % are masked, 10 % replaced with a random token, and 10 % left unchanged.
Training configuration:
Dataset: 10 million sentences (unlabeled)
Hardware: Nvidia P40, 12 GB memory
Steps: 500 000, batch size 256
Total time: 215.69 hours
Architecture: each Bi‑LSTM layer receives the sum of its input and output, followed by layer‑norm; the final Bi‑LSTM output is combined with a fully‑connected layer output to produce the sentence representation.
Question Matching
Version 1.0 (two‑level KB) : combines an LSTM domain classifier with a DSSM intent matcher.
Version 1.1 (one‑level KB) :
DSSM‑based matching : scores the input question, compares the score with thresholds x1 and x2 to decide answer type (single answer, list answer, or reject).
SPTM‑based matching : fine‑tunes the pre‑trained SPTM on the one‑level KB (same architecture, no masking). The target is the ID of the matched standard question; scoring and answer‑type decision follow the same threshold logic as DSSM.
Effectiveness Evaluation
Experiments use synthetic level‑1 and level‑2 knowledge‑base datasets located in the data_demo directory of the repository. Three models were evaluated: DSSM, SPTM, and an LSTM + DSSM fusion model. Metrics include unique‑answer accuracy, recall, F1, and CPU inference latency.
Level‑1 KB :
DSSM – Accuracy 0.8398, Recall 0.8326, F1 0.8362, Latency 3 ms
SPTM – Accuracy 0.8841, Recall 0.9002, F1 0.8921, Latency 16 ms
Level‑2 KB (LSTM + DSSM fusion) : Accuracy 0.8957, Recall 0.9027, F1 0.8992, Latency 18 ms.
Future Work
Open‑source a semi‑automatic knowledge‑base mining pipeline that combines human and machine extraction.
Provide TensorFlow 2.x or PyTorch versions of qa_match.
Contribution
Developers can submit pull requests or issues to https://github.com/wuba/qa_match.git.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
