360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios
The 360LayoutAnalysis project from 360 AI Lab releases lightweight, yolov8‑based layout analysis models covering Chinese and English papers, Chinese research reports, and a general document scenario, providing fast inference, paragraph‑level detection, and open‑source code and weights for flexible document‑understanding pipelines.
360 AI Lab's Knowledge Graph & Document Understanding team has open‑sourced a multi‑scenario lightweight layout analysis model suite called 360LayoutAnalysis , available on GitHub and Hugging Face for download and use.
Main Features:
Supports three vertical domains—Chinese papers, English papers, and Chinese research reports—plus one general‑purpose model.
Lightweight inference using a yolov8‑trained single model of only 6.23 MB.
Chinese paper scene includes paragraph information, which previous CLDA models lack.
Chinese research‑report and general scenes are trained on tens of thousands of high‑quality annotated samples, uniquely contributed by the team.
GitHub: https://github.com/360AILAB-NLP/360LayoutAnalysis Hugging Face weights: https://huggingface.co/qihoo360/360LayoutAnalysis
Why develop these models?
Document processing typically follows three routes:
Direct parsing of PDFs and other formats – simple and fast but cannot handle scanned documents and loses structural elements such as tables and figures.
OCR‑pipeline – converts the task into a series of OCR steps (layout analysis, chart parsing, formula recognition, reading order, etc.), handling scanned files and providing fine‑grained element extraction, but suffers from error propagation and high engineering effort.
OCR‑free (end‑to‑end multimodal) – leverages large multimodal models to jointly perform OCR, table, and chart understanding as a single fine‑tuned task, offering cutting‑edge performance but requiring massive data, high GPU memory, and suffering from hallucination issues.
Targeted Scenarios and Models
1. Chinese research‑report scene – a niche yet important financial document type rich in images, tables, and decision‑making content. The team annotated a dataset with nine label classes and released a dedicated model.
2. Chinese paper scene – existing datasets like CDLA are small and lack paragraph information. The team re‑annotated and expanded the data, trained a yolov8 model, and open‑sourced it.
3. English paper scene – built on the large PubLayNet dataset (over 350k training images) but also missing paragraph data; a fine‑tuned model is released.
4. General document scene – to improve cross‑domain generalization, a model trained on mixed documents with six unified labels (Text, Title, Figure, Table, Caption, Equation) is provided.
How to use the lightweight models
The layout analysis model serves as the front‑end of a document‑understanding pipeline. After obtaining bounding boxes for each detected region, downstream modules can be attached:
Text regions → OCR component for raw text extraction.
Table regions → Table‑parsing model.
Image regions → Multimodal model for image‑text understanding and summarization.
Title and figure‑caption regions → OCR → generate a table of contents.
These region outputs can also define split boundaries for Retrieval‑Augmented Generation (RAG) document chunking.
Summary
Layout analysis is a critical upstream step for document understanding; improving its accuracy and generalization directly benefits downstream tasks. Future work will focus on automated fine‑grained dataset construction, expanding label granularity, and supporting more document types.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.