Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction

The paper identifies reading‑order disorder as a critical obstacle in visually‑rich document information extraction, proposes a Token Path Prediction model with grid‑label formulation, introduces re‑annotated FUNSD‑r and CORD‑r datasets, and demonstrates SOTA performance on NER, entity linking, and reading‑order prediction tasks.

AntTech
AntTech
AntTech
Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction

Recent collaboration between Ant Security Tianjian Lab and Fudan University highlights the prevalent reading‑order disorder problem in visually‑rich document (VRD) applications, which severely degrades the performance of existing models.

To address this, the authors re‑define the NER task for VRDs as a token‑path prediction problem, constructing a fully connected directed graph where each named entity corresponds to a path between tokens. They introduce a grid‑label representation, converting the task into an N×N binary classification problem.

The proposed Token Path Prediction (TPP) head can be attached to any LayoutLM‑style document encoder. It predicts token‑pair relationships using a Global Pointer mechanism and a class‑imbalanced loss, enabling unified handling of NER, Entity Linking (EL), and Reading Order Prediction (ROP) tasks.

Newly re‑annotated datasets FUNSD‑r and CORD‑r, aligned with real‑world OCR layouts, are created to evaluate the approach. Experiments show that TPP achieves state‑of‑the‑art (SOTA) results across all tasks, outperforming baseline sequence‑labeling models, especially on the CORD‑r dataset where reading‑order issues are severe.

Additional analyses demonstrate TPP’s robustness to shuffled input orders and its ability to serve as a pre‑processing re‑ordering mechanism, improving downstream NER performance. The model has already been deployed in multiple Ant Group business scenarios involving document understanding and information extraction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Layout AnalysisNERDocument AIreading ordertoken path prediction
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.