What’s New in qa_match V1.1? Lightweight Pre‑trained Model and One‑Level KB Support

The article introduces qa_match V1.1, an open‑source deep‑learning QA matching tool that adds one‑level knowledge‑base support, releases a lightweight Bi‑LSTM pre‑trained language model (SPTM), and provides detailed architecture, training data, performance benchmarks, future plans, and contribution guidelines.

ITPUB
ITPUB
ITPUB
What’s New in qa_match V1.1? Lightweight Pre‑trained Model and One‑Level KB Support

Overview

qa_match is an open‑source lightweight question‑answer matching tool (Apache License 2.0) hosted at https://github.com/wuba/qa_match. Version 1.0 was released on 2020‑03‑09; version 1.1 adds support for a one‑level knowledge base and releases a Bi‑LSTM‑based pre‑trained language model (SPTM).

Knowledge‑Base Types

A one‑level knowledge base consists of standard questions (intents) and their expanded variations. A two‑level knowledge base adds a domain layer that groups multiple intents.

Simple Pre‑trained Model (SPTM)

SPTM replaces the Transformer with a residual Bi‑LSTM network for faster inference and removes the Next Sentence Prediction task. Pre‑training follows a BERT‑style masking strategy: 15 % of characters are selected, of which 80 % are masked, 10 % replaced with a random token, and 10 % left unchanged.

Training configuration:

Dataset: 10 million sentences (unlabeled)

Hardware: Nvidia P40, 12 GB memory

Steps: 500 000, batch size 256

Total time: 215.69 hours

Architecture: each Bi‑LSTM layer receives the sum of its input and output, followed by layer‑norm; the final Bi‑LSTM output is combined with a fully‑connected layer output to produce the sentence representation.

Question Matching

Version 1.0 (two‑level KB) : combines an LSTM domain classifier with a DSSM intent matcher.

Version 1.1 (one‑level KB) :

DSSM‑based matching : scores the input question, compares the score with thresholds x1 and x2 to decide answer type (single answer, list answer, or reject).

SPTM‑based matching : fine‑tunes the pre‑trained SPTM on the one‑level KB (same architecture, no masking). The target is the ID of the matched standard question; scoring and answer‑type decision follow the same threshold logic as DSSM.

Effectiveness Evaluation

Experiments use synthetic level‑1 and level‑2 knowledge‑base datasets located in the data_demo directory of the repository. Three models were evaluated: DSSM, SPTM, and an LSTM + DSSM fusion model. Metrics include unique‑answer accuracy, recall, F1, and CPU inference latency.

Level‑1 KB :

DSSM – Accuracy 0.8398, Recall 0.8326, F1 0.8362, Latency 3 ms

SPTM – Accuracy 0.8841, Recall 0.9002, F1 0.8921, Latency 16 ms

Level‑2 KB (LSTM + DSSM fusion) : Accuracy 0.8957, Recall 0.9027, F1 0.8992, Latency 18 ms.

Future Work

Open‑source a semi‑automatic knowledge‑base mining pipeline that combines human and machine extraction.

Provide TensorFlow 2.x or PyTorch versions of qa_match.

Contribution

Developers can submit pull requests or issues to https://github.com/wuba/qa_match.git.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIDeep LearningKnowledge Basequestion answeringqa_matchpretrained model
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.