Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction
The paper identifies reading‑order disorder as a critical obstacle in visually‑rich document information extraction, proposes a Token Path Prediction model with grid‑label formulation, introduces re‑annotated FUNSD‑r and CORD‑r datasets, and demonstrates SOTA performance on NER, entity linking, and reading‑order prediction tasks.
Recent collaboration between Ant Security Tianjian Lab and Fudan University highlights the prevalent reading‑order disorder problem in visually‑rich document (VRD) applications, which severely degrades the performance of existing models.
To address this, the authors re‑define the NER task for VRDs as a token‑path prediction problem, constructing a fully connected directed graph where each named entity corresponds to a path between tokens. They introduce a grid‑label representation, converting the task into an N×N binary classification problem.
The proposed Token Path Prediction (TPP) head can be attached to any LayoutLM‑style document encoder. It predicts token‑pair relationships using a Global Pointer mechanism and a class‑imbalanced loss, enabling unified handling of NER, Entity Linking (EL), and Reading Order Prediction (ROP) tasks.
Newly re‑annotated datasets FUNSD‑r and CORD‑r, aligned with real‑world OCR layouts, are created to evaluate the approach. Experiments show that TPP achieves state‑of‑the‑art (SOTA) results across all tasks, outperforming baseline sequence‑labeling models, especially on the CORD‑r dataset where reading‑order issues are severe.
Additional analyses demonstrate TPP’s robustness to shuffled input orders and its ability to serve as a pre‑processing re‑ordering mechanism, improving downstream NER performance. The model has already been deployed in multiple Ant Group business scenarios involving document understanding and information extraction.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.