Open‑Domain Information Extraction with UIE: Code Samples, Model Details, and Performance Highlights
This article introduces PaddleNLP's UIE tool for open‑domain information extraction, explains its underlying UIE model and ERNIE 3.0 foundation, showcases concise Python code for entity and event extraction, and presents few‑shot and SOTA performance results across multiple IE benchmarks.
The AI field often measures breakthroughs by three criteria: achieving a new SOTA on academic leaderboards, providing a unified architecture for many sub‑tasks, and turning cutting‑edge research into an easy‑to‑use open‑source tool. PaddleNLP’s UIE satisfies all three in the information‑extraction domain.
Information extraction (IE) is highly valuable for industries such as finance, government, law, and healthcare, yet traditional IE solutions are costly and domain‑specific. UIE offers a universal, open‑domain IE API that can extract any user‑defined schema (entities, relations, events) with just a few lines of code.
Below is a minimal example for entity extraction using PaddleNLP’s Taskflow :
# 实体抽取
from pprint import pprint
from paddlenlp import Taskflow
schema = ['时间', '选手', '赛事名称'] # Define the schema for entity extraction
ie = Taskflow('information_extraction', schema=schema)
pprint(ie("2月8日上午北京冬奥会自由式滑雪女子大跳台决赛中中国选手谷爱凌以188.25分获得金牌!")) # Better print results using pprint
>>> [{'时间': [{'end': 6, 'probability': 0.9857378532924486, 'start': 0, 'text': '2月8日上午'}],
'赛事名称': [{'end': 23, 'probability': 0.8503089953268272, 'start': 6, 'text': '北京冬奥会自由式滑雪女子大跳台决赛'}],
'选手': [{'end': 31, 'probability': 0.8981548639781138, 'start': 28, 'text': '谷爱凌'}]}]A similar three‑line snippet performs event extraction:
# 事件抽取
schema = {'地震触发词': ['地震强度', '时间', '震中位置', '震源深度']} # Define the schema for event extraction
ie.set_schema(schema) # Reset schema
ie('中国地震台网正式测定:5月16日06时08分在云南临沧市凤庆县(北纬24.34度,东经99.98度)发生3.5级地震,震源深度10千米。')
>> [{'地震触发词': [{'end': 58, 'probability': 0.9987181623528585, 'start': 56, 'text': '地震',
'relations': {'地震强度': [{'end': 56, 'probability': 0.9962985320905915, 'start': 52, 'text': '3.5级'}],
'时间': [{'end': 22, 'probability': 0.9882578028575182, 'start': 11, 'text': '5月16日06时08分'}],
'震中位置': [{'end': 50, 'probability': 0.8551417444021787, 'start': 23, 'text': '云南临沧市凤庆县(北纬24.34度,东经99.98度)'}],
'震源深度': [{'end': 67, 'probability': 0.999158304648045, 'start': 63, 'text': '10千米'}]}}]]Beyond the demos, UIE builds on the ACL 2022 paper “Unified Information Extraction (UIE)” and leverages the knowledge‑rich ERNIE 3.0 backbone. Two pre‑training objectives—text‑structure pairing (L_pair) and structure generation (L_record)—teach the model to map arbitrary schemas to text and to generate well‑formed extraction structures.
Empirical results show UIE achieving state‑of‑the‑art scores on 13 classic IE benchmarks, with especially strong few‑shot learning: adding only five labeled examples in a financial IE task boosts F1 by 25 points. This makes UIE a practical solution for long‑tail industry scenarios.
The article also provides links to the GitHub repository, the original arXiv paper, and community resources such as a technical discussion group and live‑stream sessions for deeper exploration.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.