Artificial Intelligence 17 min read

Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction

The paper identifies reading‑order disorder as a critical obstacle in visually‑rich document information extraction, proposes a Token Path Prediction model with grid‑label formulation, introduces re‑annotated FUNSD‑r and CORD‑r datasets, and demonstrates SOTA performance on NER, entity linking, and reading‑order prediction tasks.

AntTech

Nov 15, 2023

Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction

Recent collaboration between Ant Security Tianjian Lab and Fudan University highlights the prevalent reading‑order disorder problem in visually‑rich document (VRD) applications, which severely degrades the performance of existing models.

To address this, the authors re‑define the NER task for VRDs as a token‑path prediction problem, constructing a fully connected directed graph where each named entity corresponds to a path between tokens. They introduce a grid‑label representation, converting the task into an N×N binary classification problem.

The proposed Token Path Prediction (TPP) head can be attached to any LayoutLM‑style document encoder. It predicts token‑pair relationships using a Global Pointer mechanism and a class‑imbalanced loss, enabling unified handling of NER, Entity Linking (EL), and Reading Order Prediction (ROP) tasks.

Newly re‑annotated datasets FUNSD‑r and CORD‑r, aligned with real‑world OCR layouts, are created to evaluate the approach. Experiments show that TPP achieves state‑of‑the‑art (SOTA) results across all tasks, outperforming baseline sequence‑labeling models, especially on the CORD‑r dataset where reading‑order issues are severe.

Additional analyses demonstrate TPP’s robustness to shuffled input orders and its ability to serve as a pre‑processing re‑ordering mechanism, improving downstream NER performance. The model has already been deployed in multiple Ant Group business scenarios involving document understanding and information extraction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Layout Analysis NER document AI reading order token path prediction

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.