Tagged articles

Layout Analysis

13 articles · Page 1 of 1

May 15, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

This article provides a comprehensive technical overview of multimodal GraphRAG, detailing document‑intelligence parsing pipelines, layout analysis, OCR‑pipeline vs OCR‑free approaches, knowledge‑graph integration for chunk relationships, multimodal indexing, retrieval‑generation workflows, and a comparative analysis of RAG, GraphRAG, and KG‑QA solutions.

GraphRAGLayout AnalysisOCR

0 likes · 23 min read

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

DataFunTalk

Apr 24, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Document Intelligence, Knowledge Graphs, and Large‑Model Integration

This article presents a detailed technical walkthrough of multimodal GraphRAG, covering document‑intelligence parsing pipelines, layout‑analysis models, knowledge‑graph augmentation, multimodal indexing and retrieval, and a comparative analysis of RAG, GraphRAG, and KG‑QA approaches, with concrete examples, model sizes, benchmark scores, and research citations.

GraphRAGLarge Language ModelsLayout Analysis

0 likes · 25 min read

Exploring Multimodal GraphRAG: Document Intelligence, Knowledge Graphs, and Large‑Model Integration

Wu Shixiong's Large Model Academy

Mar 20, 2026 · Artificial Intelligence

Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing

This article examines MinerU's strengths and nine critical shortcomings—such as layout order errors, cross‑page table splits, merged‑cell failures, OCR misrecognition, and licensing issues—and provides concrete improvement strategies, interview‑ready resume bullets, and practical response frameworks for engineers.

LLMLayout AnalysisMinerU

0 likes · 13 min read

Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing

360 Tech Engineering

Jul 3, 2024 · Artificial Intelligence

360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios

The 360LayoutAnalysis project from 360 AI Lab releases lightweight, yolov8‑based layout analysis models covering Chinese and English papers, Chinese research reports, and a general document scenario, providing fast inference, paragraph‑level detection, and open‑source code and weights for flexible document‑understanding pipelines.

AI ModelLayout AnalysisMultimodal

0 likes · 9 min read

360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios

Baidu Tech Salon

Jun 7, 2024 · Artificial Intelligence

How AI Transforms Financial Report Extraction: From Layout Analysis to Table Recognition

This article examines the challenges of extracting data from complex financial reports and presents an AI‑driven solution that combines advanced layout analysis, table recognition, OCR, and large‑language‑model integration using Baidu’s PaddlePaddle low‑code platform, detailing model selection, training, performance tuning, and deployment.

AIDocument ExtractionLayout Analysis

0 likes · 11 min read

How AI Transforms Financial Report Extraction: From Layout Analysis to Table Recognition

AntTech

Nov 15, 2023 · Artificial Intelligence

Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction

The paper identifies reading‑order disorder as a critical obstacle in visually‑rich document information extraction, proposes a Token Path Prediction model with grid‑label formulation, introduces re‑annotated FUNSD‑r and CORD‑r datasets, and demonstrates SOTA performance on NER, entity linking, and reading‑order prediction tasks.

Layout AnalysisNERdocument AI

0 likes · 17 min read

Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction

Laiye Technology Team

May 18, 2022 · Artificial Intelligence

Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc

This article reviews three representative document intelligence models—StrucText, LayoutLMv3, and GraphDoc—detailing their input features, feature fusion strategies, self‑supervised tasks, and underlying architectures, and explains how they learn embeddings for segments, words, or regions to enable classification and key‑value extraction.

Graph Neural NetworksLayout AnalysisMultimodal

0 likes · 15 min read

Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc

58 Tech

Mar 17, 2021 · Artificial Intelligence

Practical Applications of OCR Technology in 58 Information Security Scenarios: Layout Analysis

This article presents the practical deployment of OCR technology within 58’s information‑security workflows, focusing on layout‑analysis techniques for document and credential recognition, detailing rule‑based, template‑matching, object‑detection, and image‑segmentation methods, their implementation steps, experimental results, advantages, limitations, and future directions.

Document RecognitionLayout AnalysisOCR

0 likes · 18 min read

Practical Applications of OCR Technology in 58 Information Security Scenarios: Layout Analysis

Tencent Cloud Developer

Mar 4, 2021 · Artificial Intelligence

WeChat OCR: Implementation of Image Text Extraction Feature

WeChat’s 8.0 update introduced an OCR pipeline that first quickly detects text in images, classifies the image type, applies a lightweight multi‑language detection network and a MobileNetV3‑based DBNet recognizer with a multi‑task CTC/Attention model, then merges results via a rule‑based layout analyzer to deliver accurate, well‑formatted extracted text across diverse languages and document types.

DBNetDeep LearningLayout Analysis

0 likes · 13 min read

WeChat OCR: Implementation of Image Text Extraction Feature

Taobao Frontend Technology

Dec 5, 2019 · Frontend Development

From UI Sketch to Code: Frontend Intelligence Generates 79% of Double‑11 Modules

This article explains how Alibaba's Front‑End Intelligent project automatically converts UI design images into production‑ready code, covering layout analysis, background and foreground processing, a fusion of traditional image algorithms with deep‑learning detection, GAN‑based complex‑background extraction, experimental results and real‑world deployment.

AutomationGaNImage processing

0 likes · 21 min read

From UI Sketch to Code: Frontend Intelligence Generates 79% of Double‑11 Modules

Alibaba Terminal Technology

Dec 5, 2019 · Frontend Development

How Frontend Code Is Automatically Generated: Inside Alibaba’s AI‑Powered D2C Pipeline

This article explains Alibaba's front‑end intelligent project that automatically generated 79.34% of the Double‑11 UI code, detailing why images are used as input, the layered image‑processing pipeline, background and foreground analysis, traditional versus deep‑learning methods, fusion techniques, evaluation results, and real‑world deployments.

Image processingLayout Analysiscode generation

0 likes · 20 min read

How Frontend Code Is Automatically Generated: Inside Alibaba’s AI‑Powered D2C Pipeline

Xianyu Technology

May 14, 2019 · Frontend Development

Structured Layout Information and Guided Line Method for UI Component Detection

The paper presents a structured layout‑information framework combined with a guided‑line “leader‑follower” algorithm that represents UI controls as Connection objects and matches them via attribute and vector similarity, enabling fast identification of recurring business components and duplicate GridView items without extensive retraining, thereby enhancing code reuse in UI2CODE projects.

Layout AnalysisStructured Datacomponent detection

0 likes · 9 min read

Structured Layout Information and Guided Line Method for UI Component Detection

Xianyu Technology

Feb 27, 2019 · Artificial Intelligence

UI2CODE: Layout Analysis and Background/Foreground Extraction for UI Images

The UI2CODE system tackles UI layout analysis by first extracting backgrounds with Sobel, Laplacian and Canny edge detection plus a flood‑fill algorithm, then isolating foreground components through connected‑component analysis and a Faster R‑CNN classifier, and finally fusing both pipelines to achieve superior precision, recall and IoU on Xianyu app screenshots.

Deep LearningFaster R-CNNImage processing

0 likes · 16 min read

UI2CODE: Layout Analysis and Background/Foreground Extraction for UI Images