Tagged articles
13 articles
Page 1 of 1
DataFunTalk
DataFunTalk
May 15, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

This article provides a comprehensive technical overview of multimodal GraphRAG, detailing document‑intelligence parsing pipelines, layout analysis, OCR‑pipeline vs OCR‑free approaches, knowledge‑graph integration for chunk relationships, multimodal indexing, retrieval‑generation workflows, and a comparative analysis of RAG, GraphRAG, and KG‑QA solutions.

Document IntelligenceGraphRAGLayout Analysis
0 likes · 23 min read
Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models
DataFunTalk
DataFunTalk
Apr 24, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Document Intelligence, Knowledge Graphs, and Large‑Model Integration

This article presents a detailed technical walkthrough of multimodal GraphRAG, covering document‑intelligence parsing pipelines, layout‑analysis models, knowledge‑graph augmentation, multimodal indexing and retrieval, and a comparative analysis of RAG, GraphRAG, and KG‑QA approaches, with concrete examples, model sizes, benchmark scores, and research citations.

Document IntelligenceGraphRAGLarge Language Models
0 likes · 25 min read
Exploring Multimodal GraphRAG: Document Intelligence, Knowledge Graphs, and Large‑Model Integration
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 20, 2026 · Artificial Intelligence

Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing

This article examines MinerU's strengths and nine critical shortcomings—such as layout order errors, cross‑page table splits, merged‑cell failures, OCR misrecognition, and licensing issues—and provides concrete improvement strategies, interview‑ready resume bullets, and practical response frameworks for engineers.

LLMLayout AnalysisMinerU
0 likes · 13 min read
Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing
360 Tech Engineering
360 Tech Engineering
Jul 3, 2024 · Artificial Intelligence

360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios

The 360LayoutAnalysis project from 360 AI Lab releases lightweight, yolov8‑based layout analysis models covering Chinese and English papers, Chinese research reports, and a general document scenario, providing fast inference, paragraph‑level detection, and open‑source code and weights for flexible document‑understanding pipelines.

AI modelLayout AnalysisMultimodal
0 likes · 9 min read
360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios
Baidu Tech Salon
Baidu Tech Salon
Jun 7, 2024 · Artificial Intelligence

How AI Transforms Financial Report Extraction: From Layout Analysis to Table Recognition

This article examines the challenges of extracting data from complex financial reports and presents an AI‑driven solution that combines advanced layout analysis, table recognition, OCR, and large‑language‑model integration using Baidu’s PaddlePaddle low‑code platform, detailing model selection, training, performance tuning, and deployment.

AIDocument ExtractionLayout Analysis
0 likes · 11 min read
How AI Transforms Financial Report Extraction: From Layout Analysis to Table Recognition
AntTech
AntTech
Nov 15, 2023 · Artificial Intelligence

Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction

The paper identifies reading‑order disorder as a critical obstacle in visually‑rich document information extraction, proposes a Token Path Prediction model with grid‑label formulation, introduces re‑annotated FUNSD‑r and CORD‑r datasets, and demonstrates SOTA performance on NER, entity linking, and reading‑order prediction tasks.

Document AILayout AnalysisNER
0 likes · 17 min read
Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction
Laiye Technology Team
Laiye Technology Team
May 18, 2022 · Artificial Intelligence

Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc

This article reviews three representative document intelligence models—StrucText, LayoutLMv3, and GraphDoc—detailing their input features, feature fusion strategies, self‑supervised tasks, and underlying architectures, and explains how they learn embeddings for segments, words, or regions to enable classification and key‑value extraction.

Document AILayout AnalysisMultimodal
0 likes · 15 min read
Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc
58 Tech
58 Tech
Mar 17, 2021 · Artificial Intelligence

Practical Applications of OCR Technology in 58 Information Security Scenarios: Layout Analysis

This article presents the practical deployment of OCR technology within 58’s information‑security workflows, focusing on layout‑analysis techniques for document and credential recognition, detailing rule‑based, template‑matching, object‑detection, and image‑segmentation methods, their implementation steps, experimental results, advantages, limitations, and future directions.

Document RecognitionLayout AnalysisOCR
0 likes · 18 min read
Practical Applications of OCR Technology in 58 Information Security Scenarios: Layout Analysis
Tencent Cloud Developer
Tencent Cloud Developer
Mar 4, 2021 · Artificial Intelligence

WeChat OCR: Implementation of Image Text Extraction Feature

WeChat’s 8.0 update introduced an OCR pipeline that first quickly detects text in images, classifies the image type, applies a lightweight multi‑language detection network and a MobileNetV3‑based DBNet recognizer with a multi‑task CTC/Attention model, then merges results via a rule‑based layout analyzer to deliver accurate, well‑formatted extracted text across diverse languages and document types.

Computer VisionDBNetDeep Learning
0 likes · 13 min read
WeChat OCR: Implementation of Image Text Extraction Feature
Taobao Frontend Technology
Taobao Frontend Technology
Dec 5, 2019 · Frontend Development

From UI Sketch to Code: Frontend Intelligence Generates 79% of Double‑11 Modules

This article explains how Alibaba's Front‑End Intelligent project automatically converts UI design images into production‑ready code, covering layout analysis, background and foreground processing, a fusion of traditional image algorithms with deep‑learning detection, GAN‑based complex‑background extraction, experimental results and real‑world deployment.

GANImage ProcessingLayout Analysis
0 likes · 21 min read
From UI Sketch to Code: Frontend Intelligence Generates 79% of Double‑11 Modules
Alibaba Terminal Technology
Alibaba Terminal Technology
Dec 5, 2019 · Frontend Development

How Frontend Code Is Automatically Generated: Inside Alibaba’s AI‑Powered D2C Pipeline

This article explains Alibaba's front‑end intelligent project that automatically generated 79.34% of the Double‑11 UI code, detailing why images are used as input, the layered image‑processing pipeline, background and foreground analysis, traditional versus deep‑learning methods, fusion techniques, evaluation results, and real‑world deployments.

Image ProcessingLayout Analysiscode-generation
0 likes · 20 min read
How Frontend Code Is Automatically Generated: Inside Alibaba’s AI‑Powered D2C Pipeline
Xianyu Technology
Xianyu Technology
May 14, 2019 · Frontend Development

Structured Layout Information and Guided Line Method for UI Component Detection

The paper presents a structured layout‑information framework combined with a guided‑line “leader‑follower” algorithm that represents UI controls as Connection objects and matches them via attribute and vector similarity, enabling fast identification of recurring business components and duplicate GridView items without extensive retraining, thereby enhancing code reuse in UI2CODE projects.

Layout AnalysisStructured Datacomponent detection
0 likes · 9 min read
Structured Layout Information and Guided Line Method for UI Component Detection
Xianyu Technology
Xianyu Technology
Feb 27, 2019 · Artificial Intelligence

UI2CODE: Layout Analysis and Background/Foreground Extraction for UI Images

The UI2CODE system tackles UI layout analysis by first extracting backgrounds with Sobel, Laplacian and Canny edge detection plus a flood‑fill algorithm, then isolating foreground components through connected‑component analysis and a Faster R‑CNN classifier, and finally fusing both pipelines to achieve superior precision, recall and IoU on Xianyu app screenshots.

Computer VisionDeep LearningFaster R-CNN
0 likes · 16 min read
UI2CODE: Layout Analysis and Background/Foreground Extraction for UI Images