Tag

PDF conversion

0 views collected around this technical thread.

Python Programming Learning Circle
Python Programming Learning Circle
Jan 30, 2024 · Fundamentals

pdf2docx: Convert PDF to DOCX with Python – Features, Limitations, Installation, and Example

This article introduces the pdf2docx Python library for converting PDF files to DOCX, detailing its capabilities such as layout, paragraph, image, and table parsing, outlining current limitations, providing installation instructions, and showing a concise code example for practical use.

PDF conversionPythondocx
0 likes · 3 min read
pdf2docx: Convert PDF to DOCX with Python – Features, Limitations, Installation, and Example
Baidu Geek Talk
Baidu Geek Talk
Jul 26, 2021 · Artificial Intelligence

Document Rendering and Structured Extraction Techniques in Baidu Wenku

Baidu Wenku converts all document types to PDF, parses the PDF into a proprietary format, uses absolute‑position layout for PC rendering, and transforms this into flow‑type structural data for mobile devices by re‑typing layout, extracting OOXML structures, and detecting charts, thereby enabling adaptive layouts, accurate formula rendering, and interactive chart extraction.

OOXML parsingPDF conversionchart extraction
0 likes · 12 min read
Document Rendering and Structured Extraction Techniques in Baidu Wenku
Python Programming Learning Circle
Python Programming Learning Circle
Jul 3, 2021 · Artificial Intelligence

Automatic PDF Slide Transcription Using Deep Learning OCR

This article demonstrates how to automatically convert PDF slide decks into editable markdown text by first converting each page to images, then applying a deep‑learning OCR pipeline (CTPN for detection and CRNN for recognition) with Python code examples, achieving high transcription accuracy.

Image ProcessingOCRPDF conversion
0 likes · 6 min read
Automatic PDF Slide Transcription Using Deep Learning OCR