Artificial Intelligence 12 min read

A Comprehensive Overview of Relation Extraction Techniques

This article surveys relation extraction, defining the task, categorizing its five main forms, and detailing key approaches such as entity position encoding, dependency‑tree methods like shortest dependency path and BRCNN, as well as distant supervision with multi‑instance learning and selective attention.

Code DAO

Apr 10, 2022

A Comprehensive Overview of Relation Extraction Techniques

Relation extraction aims to infer the semantic relation between two entities mentioned in a sentence. For example, in the sentence "Tim Cook is the current CEO of Apple," the entities Tim Cook and Apple are linked by the relation CEO .

Task categories

Based on different formulation styles, relation extraction can be divided into five categories:

Single‑sentence relation extraction

Distant supervision

Few‑shot learning

Open information extraction

Document‑level (paragraph) relation extraction

1. Single‑sentence relation extraction

This setting treats the problem as a specialized text‑classification task that must also consider the positions of the two target entities (denoted e1 and e2). Existing methods fall into two groups:

General text‑classification models combined with explicit entity‑position information.

Models that leverage the dependency‑syntax tree of the sentence.

How to incorporate entity‑position information?

Several concrete techniques are commonly used:

Mention pooling : apply max‑pooling separately to the word vectors of e1 and e2, then concatenate the pooled vectors as input to the classifier.

Position embedding : represent the relative distance of each word to e1 and e2 as a position embedding, then concatenate or add it to the word embedding before feeding it to the model. Early CNN‑based classifiers added such embeddings directly.

Entity markers : surround each entity with special tokens <e1> and </e1> (similarly for e2) to explicitly mark their boundaries. This can be directly applied to RNN‑based classifiers.

Separate encoding of context and entity tokens before merging, as illustrated in the following architecture:

2. Dependency‑tree based methods

A dependency parse tree captures grammatical relations between words. For example, the sentence "The results demonstrated that KaiC interacts rhythmically with SaSA, KaiA and KaiB." yields the dependency tree shown below.

The shortest dependency path (SDP) between two entities is the minimal sequence of edges connecting them in the tree. SDP often preserves the complete relational mention, as illustrated with the causal relation between "burst" and "pressure".

One classic SDP‑based model is the Bidirectional Recurrent Convolutional Neural Network for Relation Classification [3]. Its main ideas are:

Use only the SDP as input, discarding irrelevant words.

Treat word tokens and dependency relations as heterogeneous data; process them with separate LSTMs, then combine their outputs with a CNN.

Model the forward and reverse directions of a relation with two identical networks, because relations are directed (e.g., "A causes B" vs. "B is caused by A").

3. Distant supervision

Distant supervision assumes that if a knowledge‑graph triple (e1, r, e2) exists, any sentence containing e1 and e2 likely expresses relation r. This enables automatic labeling of large corpora, but introduces noisy labels. For instance, the triple (Bill Gates, founder, Microsoft) would incorrectly label the sentence "Bill Gates retired from Microsoft" as expressing the founder relation.

Typical noise‑reduction strategies include:

Selecting high‑quality training instances.

Designing special mechanisms or training strategies to mitigate noise.

Incorporating additional contextual information.

A highly cited work (887 citations) titled Neural Relation Extraction with Selective Attention over Instances introduces multi‑instance learning (MIL) and an attention mechanism to address noise.

Multi‑instance learning

In MIL, the basic training unit is a bag of sentences that share the same entity pair. All sentences in a bag are assigned the same relation label, and the model learns from the bag rather than individual sentences.

Formally, let S = {x_1, x_2, …, x_n} be a set of n sentences containing the pair (head, tail). The pair (S, r) constitutes a bag for relation r.

Selective attention over instances

Each sentence is encoded by a sentence encoder (CNN, PCNN, or BERT) to obtain a vector. An attention score is computed using a learnable diagonal matrix A and the relation vector r_i: score_i = r_i^T A s The weighted sum of sentence vectors yields a bag representation, which is passed through a softmax layer to produce the final probability distribution over relations.

Intuitive speculation (author’s note)

Note: The author offers informal hypotheses about how sentence vectors, relation embeddings, and attention interact during training, suggesting that correctly labeled sentences gradually align in the shared feature space while noisy sentences diverge.

References

[1] Matching the Blanks: Distributional Similarity for Relation Learning

[2] How to Extract Subject, Verb and Object by NLP (https://suttipong-kull.medium.com/how-to-extract-subject-verb-and-object-by-nlp-4149323a7d7d)

[3] Bidirectional recurrent convolutional neural network for relation classification

[4] Distant supervision for relation extraction without labeled data

[5] 多示例学习综述 (https://zhuanlan.zhihu.com/p/299819082)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NLP relation extraction distant supervision dependency parsing multi-instance learning

Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.