Artificial Intelligence 20 min read

Fine‑tuning BERT for Sentence Pair Similarity in an Online Education Platform

This article describes how a BERT‑based model is fine‑tuned to compute sentence‑pair similarity for improving recommendation accuracy in an online school, detailing the architecture, training mechanisms, code implementation, experimental results, and future extensions such as sentiment analysis.

Xueersi Online School Tech Team
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Fine‑tuning BERT for Sentence Pair Similarity in an Online Education Platform

Background : The industry typically uses two categories of sentence similarity algorithms—statistical methods and deep‑learning methods. Because statistical approaches cannot capture semantic features well, a deep‑learning solution is adopted for the online school.

Current Situation : The business involves many superficially similar sentences that belong to different categories, making traditional models inadequate; a smarter algorithm is needed.

Business Scenario : Historical data is mined and real‑time chat topics are analyzed to construct contexts, recognize intents, and quickly recommend responses, thereby improving teacher‑user communication. The overall framework is illustrated below:

Current Effect : Although coverage across categories is high, the citation rate is low due to insufficient precision. To improve accuracy, a sentence‑pair text similarity algorithm is introduced.

Improvement Plan : A BERT model is fine‑tuned for the online school’s text similarity task. The framework is shown below:

The model receives two different queries, passes them through a pre‑trained BERT network, fine‑tunes the weights, generates embeddings with strong prior knowledge and domain representation, and adds a fully connected layer for training.

BERT Overview : BERT uses a transformer encoder with token, segment, and position embeddings. Token embedding represents character vectors, segment embedding distinguishes the two sentences, and position embedding encodes token order.

Training Mechanisms :

Mask LM: Randomly masks 15% of tokens (80% with [MASK], 10% with random token, 10% unchanged) to improve contextual understanding. Loss function shown below:

Next Sentence Prediction (NSP): 50% of sentence pairs are consecutive, 50% are not, enabling the model to learn inter‑sentence relationships. Loss function shown below:

The combined loss for both tasks is illustrated here:

Finetune Overview : Because large labeled datasets are costly, the pretrained BERT network is either fine‑tuned or used as a feature extractor (e.g., taking the last layer as embeddings).

Reasons for BERT finetuning:

Limited annotated data and high labeling cost.

BERT’s Mask LM and NSP provide strong performance on small samples.

Why Use Sentence Pair :

Reduces the need for frequent model updates.

Better captures relationships between different categories.

Key Code Implementation :

Import model files:

albert_config_tiny.json
albert_model.ckpt.data-00000-of-00001
albert_model.ckpt.index
albert_model.ckpt.meta
vocab.txt

Define processor:

processors = {
    "sentence_pair": SentencePairClassificationProcessor,
}

Data processing class (excerpt):

class SentencePairClassificationProcessor(DataProcessor):
    """Processor for the internal data set. sentence pair classification"""
    def __init__(self):
        self.language = "zh"
    def get_train_examples(self, data_dir):
        return self._create_examples(self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
    def get_dev_examples(self, data_dir):
        return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
    def get_test_examples(self, data_dir):
        return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
    def get_labels(self):
        return ["0", "1"]
    def _create_examples(self, lines, set_type):
        examples = []
        print("length of lines:", len(lines))
        for i, line in enumerate(lines):
            if i == 0:
                continue
            guid = "%s-%s" % (set_type, i)
            try:
                label = tokenization.convert_to_unicode(line[0])
                text_a = tokenization.convert_to_unicode(line[1])
                text_b = tokenization.convert_to_unicode(line[2])
                examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
            except Exception:
                print('error.i:', i, line)
        return examples

Result Display : After training, the model achieved an evaluation accuracy of 0.8252471. Several example sentence pairs are shown with their tokenization logs and final probability vectors, demonstrating that the model can distinguish similar from dissimilar pairs according to business expectations.

Example output (excerpt):

INFO:tensorflow:tokens: [CLS] 不 打 算 报 名 [SEP] 孩 子 不 愿 意 报 名 [SEP]
INFO:tensorflow:input_ids: 101 679 2802 5050 2845 1399 102 2111 2094 679 2703 2692 2845 1399 102 0 ...
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 ...
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 ...
final result: [[0.7034333 0.2965667]]

Similar logs are provided for other sentence pairs, with final vectors indicating similarity probabilities that align with business intuition.

Prospects : The framework can be further expanded to address other small‑sample text similarity problems and extended to tasks such as sentiment recognition.

Deep LearningFine-tuningBERTText ClassificationChinese NLPSentence Similarity
Xueersi Online School Tech Team
Written by

Xueersi Online School Tech Team

The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.