Tag

Document Understanding

0 views collected around this technical thread.

AntTech
AntTech
Apr 10, 2025 · Artificial Intelligence

Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase

At the ICLR 2025 live session in Singapore, Ant Group showcased four cutting‑edge papers—CodePlan, Animate‑X, Group Position Embedding, and OmniKV—demonstrating advances in large‑language‑model reasoning, universal character animation, layout‑aware document understanding, and efficient long‑context inference.

AI researchDocument UnderstandingLarge Language Models
0 likes · 6 min read
Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase
DataFunSummit
DataFunSummit
Feb 21, 2025 · Artificial Intelligence

Multimodal Retrieval‑Augmented Generation (RAG): Implementation Paths and Future Prospects

This article explores multimodal Retrieval‑Augmented Generation (RAG), detailing five core topics—including semantic extraction, visual‑language models, scaling strategies, technical roadmap choices, and a Q&A—while presenting three implementation pathways, performance evaluations, and future directions for AI‑driven document understanding.

Document UnderstandingRAGTensor Retrieval
0 likes · 11 min read
Multimodal Retrieval‑Augmented Generation (RAG): Implementation Paths and Future Prospects
Baidu Geek Talk
Baidu Geek Talk
Jan 6, 2025 · Information Security

MarkupLM-based Detection of Malicious Content Scraping

The article presents a MarkupLM‑based approach that enriches BERT with XPath embeddings to jointly model webpage text and structure, enabling site‑level detection of malicious content‑scraping pages that bypass traditional rule‑based filters and demonstrating the critical role of structural cues in improving spam classification accuracy.

Document UnderstandingMarkupLMWeb Security
0 likes · 16 min read
MarkupLM-based Detection of Malicious Content Scraping
360 Tech Engineering
360 Tech Engineering
Nov 15, 2024 · Artificial Intelligence

Advances in Multimodal Large Models and Document Understanding Presented at the 2024 Global Machine Learning Conference (Beijing)

At the 2024 Global Machine Learning Conference in Beijing, 360 AI Research Institute showcased cutting‑edge multimodal large‑model research, fine‑grained open‑world object detection, and document understanding technologies, highlighting open‑source releases, real‑world deployments, and competitive achievements in AI competitions.

AI researchDocument UnderstandingLarge Models
0 likes · 7 min read
Advances in Multimodal Large Models and Document Understanding Presented at the 2024 Global Machine Learning Conference (Beijing)
Sohu Tech Products
Sohu Tech Products
Nov 6, 2024 · Artificial Intelligence

RAG2.0 Engine Design Challenges and Implementation

The talk outlines RAG2.0’s design challenges—low vector recall, complex documents, semantic gaps—and presents a two‑stage architecture using deep multimodal understanding and knowledge‑graph‑enhanced retrieval, detailing advanced chunking, multi‑index and multi‑path retrieval, efficient sorting models like ColBERT, and future multi‑modal and memory‑augmented agent directions.

ColBERTDelayed InteractionDocument Understanding
0 likes · 23 min read
RAG2.0 Engine Design Challenges and Implementation
360 Tech Engineering
360 Tech Engineering
Jul 3, 2024 · Artificial Intelligence

360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios

The 360LayoutAnalysis project from 360 AI Lab releases lightweight, yolov8‑based layout analysis models covering Chinese and English papers, Chinese research reports, and a general document scenario, providing fast inference, paragraph‑level detection, and open‑source code and weights for flexible document‑understanding pipelines.

AI modelDocument UnderstandingOpen-source
0 likes · 9 min read
360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios
DataFunSummit
DataFunSummit
Sep 5, 2023 · Artificial Intelligence

Document Intelligence: Background, Technology Stack, Large‑Model Advances, and Enterprise Applications

This article presents a comprehensive overview of document intelligence, covering its background, the evolution of related technologies, large‑model approaches such as multimodal pre‑training and domain‑specific models, and concrete enterprise use cases across various business functions.

Document UnderstandingLarge Language Modelsdocument intelligence
0 likes · 14 min read
Document Intelligence: Background, Technology Stack, Large‑Model Advances, and Enterprise Applications
AntTech
AntTech
Aug 25, 2023 · Artificial Intelligence

LayoutGCN: A Lightweight Graph Convolutional Network for Visually Rich Document Understanding

LayoutGCN is a lightweight, graph‑based framework that jointly encodes text, layout, and image features of visually rich documents, achieving competitive performance on multiple downstream tasks while drastically reducing model size and computational cost, making it suitable for edge deployment.

Document UnderstandingLayoutGCNgraph neural network
0 likes · 24 min read
LayoutGCN: A Lightweight Graph Convolutional Network for Visually Rich Document Understanding
AntTech
AntTech
Jul 31, 2023 · Artificial Intelligence

LayoutMask: Enhancing Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

LayoutMask introduces a novel multi-modal pre‑training model that replaces global 1D position with local 1D position and adds Whole Word Masking, Layout‑Aware Masking, and Masked Position Modeling, achieving state‑of‑the‑art results on various visually‑rich document understanding tasks.

AIDocument UnderstandingMultimodal Pretraining
0 likes · 15 min read
LayoutMask: Enhancing Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
DataFunSummit
DataFunSummit
Apr 7, 2023 · Artificial Intelligence

Comprehensive Overview of OCR: Types, Models, Pre‑training Techniques, and DIY Pipelines on ModelScope

This article provides a detailed introduction to OCR technology, covering its fundamental concepts, major categories (document, scene, and handwritten OCR), typical processing pipelines, a suite of open‑source models on ModelScope—including detection, recognition, and table OCR—and recent multimodal pre‑training methods such as VLDoc and VLPT.

Document UnderstandingHandwritten RecognitionModelScope
0 likes · 15 min read
Comprehensive Overview of OCR: Types, Models, Pre‑training Techniques, and DIY Pipelines on ModelScope
AntTech
AntTech
Jun 15, 2022 · Artificial Intelligence

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

XYLayoutLM introduces a layout‑aware multimodal network that improves visually‑rich document understanding by augmenting XY‑Cut for robust reading order generation and employing a Dilated Conditional Position Encoding to handle variable‑length inputs, achieving state‑of‑the‑art performance on XFUN and FUNSD datasets.

Document UnderstandingVision TransformerXYCut
0 likes · 10 min read
XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding
Architects Research Society
Architects Research Society
Jan 9, 2022 · Artificial Intelligence

Five Key Trends in AI-Powered Search and Unstructured Data Analysis

The article outlines five major trends—neural-network-enhanced search, semantic search, document understanding, image and voice search, and knowledge graphs—that are transforming enterprise use of unstructured data by leveraging AI to deliver precise, context-aware answers and insights.

AIDocument UnderstandingSearch
0 likes · 15 min read
Five Key Trends in AI-Powered Search and Unstructured Data Analysis
Architects Research Society
Architects Research Society
Aug 22, 2020 · Artificial Intelligence

Five Key Trends Shaping Enterprise Search and Unstructured Data Analysis

The article outlines how advances in neural networks, semantic search, document understanding, image and voice search, and knowledge graphs are transforming enterprise search of unstructured data, enabling more accurate, context‑aware answers and new business use cases across organizations.

AIDocument UnderstandingSemantic Search
0 likes · 13 min read
Five Key Trends Shaping Enterprise Search and Unstructured Data Analysis