Tagged articles
228 articles
Page 2 of 3
Test Development Learning Exchange
Test Development Learning Exchange
Jan 21, 2024 · Fundamentals

How to Extract MP3 Files from a PDF Using Python

This guide explains step‑by‑step how to install required Python libraries, extract text and images from a PDF, perform OCR on the images, locate embedded MP3 data in the combined text, and save the audio file, providing complete sample code for each stage.

MP3 extractionOCRPython
0 likes · 4 min read
How to Extract MP3 Files from a PDF Using Python
Open Source Tech Hub
Open Source Tech Hub
Jan 20, 2024 · Artificial Intelligence

How to Set Up ModelScope with Anaconda and Run OCR Inference via PHP

This guide walks through installing Anaconda, creating a Python 3.10 conda environment, adding PyTorch and ModelScope libraries, installing domain-specific dependencies, verifying NLP pipelines, and using PHPY to call ModelScope's OCR model from PHP, complete with code snippets and troubleshooting tips.

AI inferenceAnacondaModelScope
0 likes · 10 min read
How to Set Up ModelScope with Anaconda and Run OCR Inference via PHP
Sohu Tech Products
Sohu Tech Products
Dec 27, 2023 · Artificial Intelligence

OCR-Based Video Review System: Technology Selection, Optimization, and Model Fine-Tuning

An OCR‑based video review system using PaddleOCR’s DB detector and SVTR recognizer, combined with multi‑level frame deduplication, message‑queue task decoupling, Redis prioritization, and dynamic thread‑pool scheduling, was fine‑tuned on 5 000 samples to cut daily frames from 794 million to 3.6 million, achieving automated detection of over 230 abnormal videos per day and replacing three manual reviewers, with future plans for GPU acceleration and cross‑instance GRPC dispatch.

AIFine-tuningModel Selection
0 likes · 20 min read
OCR-Based Video Review System: Technology Selection, Optimization, and Model Fine-Tuning
Tencent Tech
Tencent Tech
Oct 20, 2023 · Artificial Intelligence

Tencent OCR's AI Triumph at ICDAR 2023: Four Championship Wins

At ICDAR 2023, Tencent's OCR team leveraged self‑developed algorithms and large‑model backbones to clinch four official championship titles across the DSText and SVRD tracks, showcasing breakthroughs in dense video text detection, tracking, end‑to‑end recognition, and structured information extraction.

ICDAR 2023OCRStructured Information Extraction
0 likes · 14 min read
Tencent OCR's AI Triumph at ICDAR 2023: Four Championship Wins
ZhongAn Tech Team
ZhongAn Tech Team
Oct 20, 2023 · Artificial Intelligence

Document Analytics & Anti‑Fraud Support Platform for Hong Kong Virtual Banking

This article describes the design and implementation of a Document Analytics & Anti‑Fraud Support platform for Hong Kong virtual banking, detailing its OCR/NLP‑driven pipeline, dynamic rule engine, multi‑template PDF processing, model training, and the resulting improvements in fraud detection and operational efficiency.

NLPOCRanti-fraud
0 likes · 18 min read
Document Analytics & Anti‑Fraud Support Platform for Hong Kong Virtual Banking
Bilibili Tech
Bilibili Tech
Oct 13, 2023 · Artificial Intelligence

Multimodal Video High‑Energy Segment Extraction for Dynamic Video Covers

The authors present a multimodal system that automatically extracts high‑energy video segments for dynamic covers by analyzing subtitles, audio, visual frames, and danmu, employing LLM prompt‑tuning, scene‑cut detection, and aesthetic scoring to reduce manual effort and boost click‑through rates.

ASRMultimodal AIOCR
0 likes · 14 min read
Multimodal Video High‑Energy Segment Extraction for Dynamic Video Covers
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 16, 2023 · Artificial Intelligence

Deep Dive into OCR – Chapter 2: Development and Classification of OCR Technology

This article provides a comprehensive overview of OCR technology, detailing the evolution from traditional hand‑crafted methods to modern deep‑learning approaches, describing image preprocessing, text detection and recognition pipelines, summarizing classic machine‑learning algorithms, and presenting a practical OpenCV implementation with Python code.

Computer VisionDeep LearningOCR
0 likes · 23 min read
Deep Dive into OCR – Chapter 2: Development and Classification of OCR Technology
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 12, 2023 · Artificial Intelligence

An Introduction to OCR: Concepts, History, Applications, Datasets, and Technical Workflow

This article provides a comprehensive overview of Optical Character Recognition (OCR), covering its definition, historical development, classification, real‑world applications, technical pipeline, common challenges, mitigation strategies, popular datasets, model performance comparisons, and leading open‑source platforms.

Computer VisionDatasetsDeep Learning
0 likes · 16 min read
An Introduction to OCR: Concepts, History, Applications, Datasets, and Technical Workflow
php Courses
php Courses
Jun 29, 2023 · Backend Development

How to Extract Text from Images Using PHP and Tesseract OCR

This tutorial demonstrates how to install the Tesseract OCR library via Composer, set up a PHP script to load an image, create a TesseractOCR instance, run the OCR process, and output the extracted text, providing complete sample code for each step.

BackendOCRimage-processing
0 likes · 3 min read
How to Extract Text from Images Using PHP and Tesseract OCR
High Availability Architecture
High Availability Architecture
Jun 15, 2023 · Artificial Intelligence

InferX Inference Framework: Challenges, Architecture, Optimizations, and Triton Integration

The article presents the background, challenges, and objectives of Bilibili's AI services, introduces the self‑developed InferX inference framework with its quantization and sparsity optimizations, details OCR‑specific enhancements, and describes how integrating InferX with Nvidia Triton dramatically improves throughput, latency, and GPU utilization.

AI OptimizationCUDAInference
0 likes · 10 min read
InferX Inference Framework: Challenges, Architecture, Optimizations, and Triton Integration
DataFunTalk
DataFunTalk
May 13, 2023 · Artificial Intelligence

Multimedia Content Understanding at Weibo: Video Summarization, Quality Assessment, OCR, Embedding, and CV‑CUDA Optimization

This article presents Weibo's comprehensive multimedia content understanding pipeline, covering video summarization techniques, quality assessment models, OCR advancements, video embedding strategies, and the performance benefits of CV‑CUDA acceleration, while highlighting real‑world applications and engineering trade‑offs.

CV-CUDAComputer VisionDeep Learning
0 likes · 32 min read
Multimedia Content Understanding at Weibo: Video Summarization, Quality Assessment, OCR, Embedding, and CV‑CUDA Optimization
DataFunSummit
DataFunSummit
Apr 7, 2023 · Artificial Intelligence

Comprehensive Overview of OCR: Types, Models, Pre‑training Techniques, and DIY Pipelines on ModelScope

This article provides a detailed introduction to OCR technology, covering its fundamental concepts, major categories (document, scene, and handwritten OCR), typical processing pipelines, a suite of open‑source models on ModelScope—including detection, recognition, and table OCR—and recent multimodal pre‑training methods such as VLDoc and VLPT.

ModelScopeOCRTable OCR
0 likes · 15 min read
Comprehensive Overview of OCR: Types, Models, Pre‑training Techniques, and DIY Pipelines on ModelScope
Python Programming Learning Circle
Python Programming Learning Circle
Mar 8, 2023 · Artificial Intelligence

Using ddddocr SDK for Captcha Recognition in Python

This article introduces the open‑source ddddocr SDK, demonstrates how to install it and use it in Python to automatically solve three common captcha types—slider, click‑based, and alphanumeric—providing code examples and result explanations for each.

CaptchaComputer VisionOCR
0 likes · 4 min read
Using ddddocr SDK for Captcha Recognition in Python
ELab Team
ELab Team
Feb 20, 2023 · Artificial Intelligence

How MegaPortal Brings Stable Diffusion to iOS: A Hands‑On Guide

MegaPortal is an easy‑to‑use AI model loader for Apple devices that lets users configure visual‑block Snippets for tasks such as face‑filtering, Genshin Impact gacha recommendation, and Stable Diffusion image generation, with step‑by‑step tutorials, system requirements, cache clearing, model downloads, and a call for iOS‑dev help.

@snippetAI Model LoaderMegaPortal
0 likes · 20 min read
How MegaPortal Brings Stable Diffusion to iOS: A Hands‑On Guide
DataFunSummit
DataFunSummit
Jan 23, 2023 · Artificial Intelligence

Intelligent Document Processing: Core Technologies, Techniques, and Practical Insights

This article explains intelligent document processing (IDP) by describing its core components—OCR, document parsing, and information extraction—detailing various OCR and text‑detection algorithms, discussing document layout reconstruction, table parsing, domain‑specific model adaptation, system optimization, and productization challenges, and outlining future research directions.

AIDocument ParsingInformation Extraction
0 likes · 27 min read
Intelligent Document Processing: Core Technologies, Techniques, and Practical Insights
Laiye Technology Team
Laiye Technology Team
Dec 16, 2022 · Artificial Intelligence

Efficient Production of Scene-specific OCR Models Using an AI Platform

This article explains how a unified AI platform enables rapid, data‑driven creation, training, deployment, and evaluation of OCR models for visually distinct text regions such as seals, meter readings, license plates, and VIN codes, while minimizing hardware and annotation costs.

AI PlatformComputer VisionKubeflow
0 likes · 7 min read
Efficient Production of Scene-specific OCR Models Using an AI Platform
Tencent Cloud Developer
Tencent Cloud Developer
Dec 12, 2022 · Artificial Intelligence

Performance Optimization of Tencent Cloud OCR Service: Reducing Latency and Improving Throughput

Tencent Cloud’s OCR team cut average response time from 1.8 seconds to under one second and boosted throughput by over 50 % by redesigning the model with self‑attention, accelerating inference with a Tensor‑Network accelerator, shrinking RPC payloads, enabling asynchronous logging, and optimizing multi‑region GPU memory utilization.

AI modelCloud ServicesInference Acceleration
0 likes · 13 min read
Performance Optimization of Tencent Cloud OCR Service: Reducing Latency and Improving Throughput
Laiye Technology Team
Laiye Technology Team
Nov 23, 2022 · Artificial Intelligence

Design and Practices of a Data‑Driven OCR Testing System

The article describes Laiye's shift to a data‑driven deep‑learning workflow and presents the design, macro‑ and micro‑analysis features, visual diff tools, distributed tracing, and code examples of their OCR testing system that accelerate model evaluation and iterative optimization.

AIData‑DrivenMLOps
0 likes · 11 min read
Design and Practices of a Data‑Driven OCR Testing System
Shopee Tech Team
Shopee Tech Team
Nov 10, 2022 · Artificial Intelligence

ShopeeVideo OCR: Multi-language Text Recognition System for E-commerce Video

ShopeeVideo OCR is a multi‑language text‑recognition system for Southeast Asian e‑commerce videos that unifies detection, Transformer‑based recognition, layout analysis, and large‑scale synthetic data generation to handle Indonesian, Filipino, English, Vietnamese, Thai and Chinese scripts, delivering industry‑leading accuracy and winning thirteen ICDAR first‑place awards.

Computer VisionDeep LearningMulti-language OCR
0 likes · 15 min read
ShopeeVideo OCR: Multi-language Text Recognition System for E-commerce Video
DataFunTalk
DataFunTalk
Nov 10, 2022 · Artificial Intelligence

A Comprehensive Overview of OCR Technology Development and Engineering Practices

This article reviews the 40‑year evolution of Optical Character Recognition, discusses its integration with Intelligent Document Processing, outlines recent research hotspots such as scene text recognition and domain‑specific symbol detection, and shares practical engineering experiences and future directions from Datagrand.

Document ProcessingIntelligent Document ProcessingOCR
0 likes · 24 min read
A Comprehensive Overview of OCR Technology Development and Engineering Practices
Zhuanzhuan Tech
Zhuanzhuan Tech
Nov 9, 2022 · Artificial Intelligence

Applying OCR to Game Skin Recognition: Filtering Owned Skins and Tolerant Text Matching

This article describes how OCR technology is used in a game marketplace to automatically extract skin parameters from user‑uploaded images, outlines methods for separating owned skin regions from background using color analysis, and presents a tolerant matching solution based on Rabin‑Karp hashing to handle OCR errors.

Computer VisionGame DevelopmentImage Processing
0 likes · 10 min read
Applying OCR to Game Skin Recognition: Filtering Owned Skins and Tolerant Text Matching
Baidu Geek Talk
Baidu Geek Talk
Oct 17, 2022 · Artificial Intelligence

OCR Technology: PaddleOCR and Paddle.js Integration

The article explains OCR fundamentals and details how Baidu’s open‑source PaddleOCR suite can be converted and run in browsers via the @paddlejs‑models/ocr SDK, describing model initialization, detection and CRNN‑based recognition pipelines, and presenting benchmark results that show the newer ch_PP‑OCRv2 model achieving higher accuracy and faster inference than the mobile variant.

AIComputer VisionOCR
0 likes · 9 min read
OCR Technology: PaddleOCR and Paddle.js Integration
HaoDF Tech Team
HaoDF Tech Team
Oct 8, 2022 · Artificial Intelligence

Exploring Transformer Technology and Its Applications in NLP, Computer Vision, and OCR at Haodf.com

This article introduces the Transformer architecture, explains its attention mechanism, details its adaptations for natural language processing, computer vision, and OCR tasks, and presents experimental results of various models such as BERT, ELECTRA, Swin Transformer, and CRNN-BCN on large-scale medical data from Haodf.com.

Model EvaluationNLPOCR
0 likes · 39 min read
Exploring Transformer Technology and Its Applications in NLP, Computer Vision, and OCR at Haodf.com
DataFunSummit
DataFunSummit
Sep 6, 2022 · Artificial Intelligence

Recent Advances in Self‑Supervised Learning for Text Recognition (OCR)

This article reviews recent progress in applying self‑supervised learning to OCR text recognition, covering mainstream model architectures, key considerations for self‑supervised tasks on text images, and detailed analyses of representative papers such as SeqCLR, SimAN, and DiG, highlighting their designs, experiments, and results.

Computer VisionOCRcontrastive learning
0 likes · 20 min read
Recent Advances in Self‑Supervised Learning for Text Recognition (OCR)
DevOps
DevOps
Aug 23, 2022 · Artificial Intelligence

Intelligent Automation Testing: Self‑Healing and Machine‑Learning Techniques

This article reviews the evolution of automated testing toward intelligent solutions, explaining self‑healing mechanisms, machine‑learning‑driven object recognition, computer‑vision and OCR approaches, industry tools such as Healenium and Airtest, and future prospects for zero‑code AI‑powered test automation.

AIComputer VisionOCR
0 likes · 13 min read
Intelligent Automation Testing: Self‑Healing and Machine‑Learning Techniques
Laiye Technology Team
Laiye Technology Team
Aug 15, 2022 · Artificial Intelligence

Recent Advances in Self‑Supervised Learning for Text Recognition

This article reviews recent self‑supervised learning approaches for optical character recognition, covering mainstream OCR model architectures, key factors for applying contrastive and masked image modeling methods to text images, and detailed analyses of representative works such as SeqCLR, SimAN, and DiG, including their designs and experimental results.

OCRcontrastive learningmasked image modeling
0 likes · 19 min read
Recent Advances in Self‑Supervised Learning for Text Recognition
Python Programming Learning Circle
Python Programming Learning Circle
Jul 21, 2022 · Artificial Intelligence

Building an Automatic Math Problem Grading System with Python and Convolutional Neural Networks

This tutorial explains how to generate synthetic digit images, train a CNN model to recognize handwritten numbers and operators, segment scanned math worksheets using projection techniques, evaluate each expression with Python's eval, and overlay the results on the original image to provide automatic grading feedback.

AutomationCNNOCR
0 likes · 26 min read
Building an Automatic Math Problem Grading System with Python and Convolutional Neural Networks
Laiye Technology Team
Laiye Technology Team
Jul 16, 2022 · Artificial Intelligence

Seal (Stamp) Recognition in Intelligent Document Processing: Challenges, Methods, and Experiments

This article explains how intelligent document processing uses deep‑learning‑based seal detection and OCR techniques—enhanced YOLOv5, multi‑label loss, combined NMS, and end‑to‑end models such as Mask‑TextSpotter, ABCNet, PGNet, and TrOCR—to overcome diverse stamp styles, background interference, and image quality issues, presenting experimental results that surpass commercial OCR vendors.

AIDocument ProcessingOCR
0 likes · 13 min read
Seal (Stamp) Recognition in Intelligent Document Processing: Challenges, Methods, and Experiments
MaGe Linux Operations
MaGe Linux Operations
Jul 3, 2022 · Backend Development

How to Automate 10,000 Video‑Channel Posts with Python and OCR for Massive Traffic

This guide shows how to use Python to scrape high‑quality chat screenshots, apply OCR, generate silent chat videos, batch‑download matching audio from short‑video platforms, and combine them into thousands of unique WeChat Video Channel clips, leveraging volume to outsmart recommendation algorithms and boost traffic.

AutomationOCRPython
0 likes · 11 min read
How to Automate 10,000 Video‑Channel Posts with Python and OCR for Massive Traffic
Programmer DD
Programmer DD
Apr 18, 2022 · Artificial Intelligence

Unlocking Captcha Secrets: How the Open‑Source ddddocr Python Library Works

This article introduces the open‑source Python library ddddocr, explains its evolution from version 1.2.0 to 1.4.3—including OCR, target detection, and slider recognition features—and shows how it leverages deep‑learning and OpenCV to simplify captcha solving for developers.

CaptchaDeep LearningOCR
0 likes · 4 min read
Unlocking Captcha Secrets: How the Open‑Source ddddocr Python Library Works
DataFunTalk
DataFunTalk
Apr 5, 2022 · Artificial Intelligence

Applying AI Technologies in the Youdao Dictionary Pen: Scanning, Offline Translation, and Edge ML Library

This article presents a technical overview of the Youdao Dictionary Pen, describing its hardware design, real‑time scanning and point‑query image processing, on‑device offline translation with model compression techniques, and the high‑performance Edge ML Library (EMLL) that enables efficient AI inference on constrained edge hardware.

AIEdge ComputingEdge ML Library
0 likes · 18 min read
Applying AI Technologies in the Youdao Dictionary Pen: Scanning, Offline Translation, and Edge ML Library
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Apr 1, 2022 · Artificial Intelligence

Learning OCR for Game Text Recognition: From Data Preparation to CRNN Model Training

This article documents the author’s step‑by‑step journey of building an OCR system for recognizing Chinese characters in a card‑game UI, covering game selection, technical background, data generation, deep‑learning model training with CRNN, real‑image data collection, optimization attempts, and final performance evaluation.

CRNNDeep LearningEasyOCR
0 likes · 15 min read
Learning OCR for Game Text Recognition: From Data Preparation to CRNN Model Training
Laiye Technology Team
Laiye Technology Team
Mar 25, 2022 · Artificial Intelligence

Laiye OCR Error‑Correction Model: Architecture, Implementation, and Evaluation

This article describes Laiye's OCR error‑correction system, detailing the background challenges of Chinese character recognition, the analysis of three possible solutions, the chosen post‑processing approach, model architecture, training data, loss design, online inference, and experimental results showing a measurable performance boost.

Chinese textComputer VisionDeep Learning
0 likes · 13 min read
Laiye OCR Error‑Correction Model: Architecture, Implementation, and Evaluation
Python Programming Learning Circle
Python Programming Learning Circle
Mar 3, 2022 · Artificial Intelligence

Ten‑Line Python Projects: QR Code, Word Cloud, Image Segmentation, Sentiment Analysis, Mask Detection, Message Spam, OCR, and a Simple Game

This article presents a series of concise Python examples—each under ten lines—demonstrating how to generate QR codes, create word clouds, perform image segmentation, conduct sentiment analysis, detect masks, automate message sending, extract text with OCR, and build a basic number‑guessing game, showcasing the versatility of Python for quick prototyping across AI and utility tasks.

GameOCRQR code
0 likes · 10 min read
Ten‑Line Python Projects: QR Code, Word Cloud, Image Segmentation, Sentiment Analysis, Mask Detection, Message Spam, OCR, and a Simple Game
DataFunSummit
DataFunSummit
Jan 5, 2022 · Artificial Intelligence

Improving Financial Micro‑Business Efficiency with OCR: Challenges, Applications, and an Intelligent Platform

This article explores how optical character recognition (OCR) technology can address the financing pain points of micro‑enterprises by automating document verification, enhancing risk assessment, and enabling an end‑to‑end intelligent OCR platform built on deep‑learning models, data pipelines, and deployment automation.

Computer VisionDocument AutomationMicro Business
0 likes · 15 min read
Improving Financial Micro‑Business Efficiency with OCR: Challenges, Applications, and an Intelligent Platform
Laiye Technology Team
Laiye Technology Team
Dec 31, 2021 · Artificial Intelligence

Overview of Table Recognition Techniques and Practical Implementation

This article reviews the challenges of extracting structured table data from images, compares two‑stage and end‑to‑end OCR approaches, evaluates four state‑of‑the‑art table‑recognition models (SPLERGE, CascadeTabNet, TableMASTER, UnetTable), and presents a practical deployment workflow with performance metrics.

AIComputer VisionDeep Learning
0 likes · 14 min read
Overview of Table Recognition Techniques and Practical Implementation
Python Crawling & Data Mining
Python Crawling & Data Mining
Dec 17, 2021 · Artificial Intelligence

Decoding Randomized Custom Fonts with Python: Glyph Matching and OCR Techniques

This article explains how to handle custom web fonts whose glyph order or shapes are randomized by extracting glyph metadata with FontTools, creating binary signatures for reliable matching, and applying image‑recognition OCR to decode characters when glyph contours also change, complete with code examples and step‑by‑step instructions.

OCRcustom fontsfontTools
0 likes · 32 min read
Decoding Randomized Custom Fonts with Python: Glyph Matching and OCR Techniques
Baidu App Technology
Baidu App Technology
Dec 7, 2021 · Artificial Intelligence

Paddle.js OCR SDK: Text Recognition in Web Browsers

Paddle.js OCR SDK brings Baidu’s lightweight PaddleOCR models to web browsers, offering init() and recognize() APIs that load the ch_PP-OCRv2 detection (DB) and recognition (CRNN with bidirectional LSTM) models in parallel, achieving 258 ms detection, 60 ms recognition, 0.52 F‑score, and a combined size under 12 MB.

AIOCRPaddle.js
0 likes · 7 min read
Paddle.js OCR SDK: Text Recognition in Web Browsers
Laiye Technology Team
Laiye Technology Team
Sep 24, 2021 · Artificial Intelligence

Self‑Supervised Learning and Contrastive Methods for Computer Vision and OCR Applications

This article surveys self‑supervised learning techniques for computer‑vision tasks, explains common pretext tasks and contrastive loss designs, reviews representative models such as SimCLR, MoCo, SmAV and SimSiam, and demonstrates their practical impact on a captcha‑OCR system with measurable accuracy gains.

Computer VisionDeep LearningOCR
0 likes · 23 min read
Self‑Supervised Learning and Contrastive Methods for Computer Vision and OCR Applications
DataFunTalk
DataFunTalk
Sep 21, 2021 · Artificial Intelligence

Text Recognition Techniques for Content Safety: Risks, Workflow, Algorithms, and Deployment Optimization

This article explains how OCR-based text recognition is applied to content safety, detailing common risk categories, a step‑by‑step detection and recognition pipeline, mainstream detection and recognition algorithms such as regression‑based and segmentation‑based methods, and practical deployment and performance optimization strategies.

AIContent SafetyOCR
0 likes · 15 min read
Text Recognition Techniques for Content Safety: Risks, Workflow, Algorithms, and Deployment Optimization
Baidu Geek Talk
Baidu Geek Talk
Sep 8, 2021 · Artificial Intelligence

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Computer VisionModel OptimizationOCR
0 likes · 10 min read
How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations
ByteDance SE Lab
ByteDance SE Lab
Jul 23, 2021 · Mobile Development

How to Accurately Measure Mobile App Response Time Using Video Frame Detection and OCR

This article presents a method for precisely measuring mobile app response latency by extracting video frames, detecting start and end frames through image markers and OCR, and calculating the time difference, offering a high‑precision, customizable solution for performance evaluation across diverse app scenarios.

OCRapp latencyframe detection
0 likes · 12 min read
How to Accurately Measure Mobile App Response Time Using Video Frame Detection and OCR
MaGe Linux Operations
MaGe Linux Operations
Jul 13, 2021 · Artificial Intelligence

Build a Batch Image Translation Tool with Youdao OCR API in Python

This article walks through creating a Python desktop demo that uses Youdao's OCR translation API to batch‑process cosmetic product label images, covering API credential setup, request parameters, signature generation, core code snippets, and a summary of the translation results.

APIOCRPython
0 likes · 10 min read
Build a Batch Image Translation Tool with Youdao OCR API in Python
Python Programming Learning Circle
Python Programming Learning Circle
Jul 3, 2021 · Artificial Intelligence

Automatic PDF Slide Transcription Using Deep Learning OCR

This article demonstrates how to automatically convert PDF slide decks into editable markdown text by first converting each page to images, then applying a deep‑learning OCR pipeline (CTPN for detection and CRNN for recognition) with Python code examples, achieving high transcription accuracy.

Deep LearningImage ProcessingOCR
0 likes · 6 min read
Automatic PDF Slide Transcription Using Deep Learning OCR
TiPaiPai Technical Team
TiPaiPai Technical Team
Jun 28, 2021 · Artificial Intelligence

How Deep Learning Unwarps Twisted Document Images: DocUNet & DewarpNet Explained

This article reviews two end‑to‑end deep‑learning approaches—DocUNet (CVPR 2018) and DewarpNet (ICCV 2019)—for correcting warped document images, detailing their network architectures, synthetic data generation, loss functions, experimental results, and the remaining challenges in document dewarping.

Computer VisionDeep LearningImage Processing
0 likes · 14 min read
How Deep Learning Unwarps Twisted Document Images: DocUNet & DewarpNet Explained
Python Programming Learning Circle
Python Programming Learning Circle
Jun 25, 2021 · Artificial Intelligence

Batch Image Translation Demo Using Youdao OCR API with Python

This article presents a step‑by‑step Python demo that uses Youdao's OCR translation API to batch‑process cosmetic product images, covering API key setup, request parameters, signature generation, GUI implementation with Tkinter, and code snippets for file selection, result storage, and API invocation.

AIBatch ProcessingOCR
0 likes · 10 min read
Batch Image Translation Demo Using Youdao OCR API with Python
TiPaiPai Technical Team
TiPaiPai Technical Team
Jun 18, 2021 · Artificial Intelligence

Mastering Text Recognition: Encoder & Decoder Strategies Explained

This article reviews modern text‑recognition systems, detailing how encoders such as CNN, CNN‑BiLSTM, and Transformer‑based models extract visual features, and how decoders like Position Attention, Transformer decoders, and RNN Seq2Seq align variable‑length text, while also discussing CTC loss and practical design choices.

CNNCTCDecoder
0 likes · 9 min read
Mastering Text Recognition: Encoder & Decoder Strategies Explained
Xianyu Technology
Xianyu Technology
Jun 3, 2021 · Mobile Development

Extending Flutter UI Automation: Analysis of Flutter Driver, Integration Test, and Xianyu's Hybrid Approach

The article explains that Flutter Driver and Integration Test struggle to locate elements in hybrid native‑Flutter apps, then describes Xianyu’s approach of extending native UI automation with OCR, image‑matching, and a layered page‑object architecture, achieving over 98% success across 500+ runs.

FlutterImage ProcessingOCR
0 likes · 9 min read
Extending Flutter UI Automation: Analysis of Flutter Driver, Integration Test, and Xianyu's Hybrid Approach
TiPaiPai Technical Team
TiPaiPai Technical Team
May 21, 2021 · Artificial Intelligence

How AI Powers Automatic Homework Grading: Challenges and Solutions

Automatic homework grading leverages AI to transform captured images into graded results through preprocessing, layout analysis, OCR, answer matching, and strategy modules, while addressing three question categories—logical, text‑rich, and graphic—each presenting distinct technical challenges and future research directions.

AIEducation TechnologyImage Processing
0 likes · 7 min read
How AI Powers Automatic Homework Grading: Challenges and Solutions
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 26, 2021 · Artificial Intelligence

Insights into OCR Technology at iQIYI: Development, Challenges, and Applications

iQIYI’s OCR journey, explained by researcher Harlon, covers the evolution from separate detection and recognition pipelines to end‑to‑end models, key algorithms like CTPN, DB and CRNN, large‑scale simulated training, diverse video‑text applications, and future goals such as mobile deployment and tighter NLP integration.

AIComputer VisionDeep Learning
0 likes · 21 min read
Insights into OCR Technology at iQIYI: Development, Challenges, and Applications
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Mar 23, 2021 · Artificial Intelligence

How to Recognize Credit Card Numbers with OpenCV: A Step‑by‑Step Tutorial

This tutorial walks through a project‑based OpenCV workflow that reads a digit template, preprocesses both template and credit‑card images, extracts individual numbers, matches them against the template, and finally overlays the recognized digits onto the original image, illustrating core computer‑vision techniques.

Computer VisionImage ProcessingOCR
0 likes · 10 min read
How to Recognize Credit Card Numbers with OpenCV: A Step‑by‑Step Tutorial
Amap Tech
Amap Tech
Mar 22, 2021 · Artificial Intelligence

Visual Technology for Automated POI Name Generation: STR, Text Detection, and Naming Practices

Amap’s visual‑technology pipeline automatically generates and updates POI names by crowdsourcing street‑level images, applying deep‑learning scene‑text recognition, dual‑branch classification of text attributes, and a BERT‑plus‑graph‑attention model that selects and orders recognized text, achieving about 95 % naming accuracy.

Computer VisionDeep LearningName Generation
0 likes · 14 min read
Visual Technology for Automated POI Name Generation: STR, Text Detection, and Naming Practices
58 Tech
58 Tech
Mar 17, 2021 · Artificial Intelligence

Practical Applications of OCR Technology in 58 Information Security Scenarios: Layout Analysis

This article presents the practical deployment of OCR technology within 58’s information‑security workflows, focusing on layout‑analysis techniques for document and credential recognition, detailing rule‑based, template‑matching, object‑detection, and image‑segmentation methods, their implementation steps, experimental results, advantages, limitations, and future directions.

Document RecognitionLayout AnalysisOCR
0 likes · 18 min read
Practical Applications of OCR Technology in 58 Information Security Scenarios: Layout Analysis
Java Architect Essentials
Java Architect Essentials
Mar 7, 2021 · Artificial Intelligence

ID Card OCR Project Using JavaCPP, OpenCV, and Tess4j

This article describes a Java-based ID card number recognition project that integrates Tess4j with JavaCPP to leverage OpenCV functionality without requiring a separate OpenCV installation, outlines required software, troubleshooting steps, recent updates, and provides the project repository link.

ID CardJavaCPPOCR
0 likes · 4 min read
ID Card OCR Project Using JavaCPP, OpenCV, and Tess4j
Tencent Cloud Developer
Tencent Cloud Developer
Mar 4, 2021 · Artificial Intelligence

WeChat OCR: Implementation of Image Text Extraction Feature

WeChat’s 8.0 update introduced an OCR pipeline that first quickly detects text in images, classifies the image type, applies a lightweight multi‑language detection network and a MobileNetV3‑based DBNet recognizer with a multi‑task CTC/Attention model, then merges results via a rule‑based layout analyzer to deliver accurate, well‑formatted extracted text across diverse languages and document types.

Computer VisionDBNetDeep Learning
0 likes · 13 min read
WeChat OCR: Implementation of Image Text Extraction Feature
DataFunTalk
DataFunTalk
Feb 16, 2021 · Artificial Intelligence

Multimedia Content Understanding in Meitu Community: Video Classification, Fingerprinting, and OCR

This article presents Meitu Community's AI‑driven multimedia content analysis pipeline, covering short‑video classification, video fingerprinting, and OCR, detailing model choices, experimental results, and future directions for improving content audit, quality, tagging, and feature engineering.

AIComputer VisionFingerprinting
0 likes · 18 min read
Multimedia Content Understanding in Meitu Community: Video Classification, Fingerprinting, and OCR
DataFunTalk
DataFunTalk
Feb 12, 2021 · Artificial Intelligence

PlugNet: A Plug‑in Super‑Resolution Unit for Low‑Quality Text Recognition in Natural Scene OCR

This article introduces ImageDT's PlugNet, which combines deep‑learning OCR and super‑resolution techniques to improve low‑quality text recognition in natural scenes, detailing the company's background, OCR challenges, deep‑learning approaches, super‑resolution methods, the PlugNet architecture, experimental results, and future research directions.

AILow-Quality TextOCR
0 likes · 16 min read
PlugNet: A Plug‑in Super‑Resolution Unit for Low‑Quality Text Recognition in Natural Scene OCR
php Courses
php Courses
Feb 4, 2021 · Information Security

Analyzing and Decoding CAPTCHA Images Using PHP

This article explains how to extract RGB values from a CAPTCHA image with PHP, convert the pixel data into binary patterns, map those patterns to digits using a predefined dictionary, and achieve 100% recognition accuracy, illustrating a practical backend security technique.

BackendImage ProcessingOCR
0 likes · 4 min read
Analyzing and Decoding CAPTCHA Images Using PHP
HaoDF Tech Team
HaoDF Tech Team
Feb 2, 2021 · Artificial Intelligence

AI‑Based Structuring of Medical Examination Reports: OCR, Text Detection, Classification, and NER

This article describes how a Chinese online medical platform tackled the large‑scale extraction and structuring of hospital report images by combining OCR, deep‑learning text‑region detection, fast text classification, and advanced NER techniques, detailing challenges, algorithm choices, performance results, and remaining issues.

AINERNLP
0 likes · 19 min read
AI‑Based Structuring of Medical Examination Reports: OCR, Text Detection, Classification, and NER
Programmer DD
Programmer DD
Dec 17, 2020 · Artificial Intelligence

Turn Screenshots into Editable Text Instantly with TextShot – A Simple OCR Tool

TextShot, a newly released open‑source Python utility by GitHub user ianzhao05, lets you capture any screen region and instantly convert the image to editable text using Tesseract OCR, with multilingual support, hotkey integration, and step‑by‑step installation guidance for Windows and Linux.

OCROpenCVTextShot
0 likes · 6 min read
Turn Screenshots into Editable Text Instantly with TextShot – A Simple OCR Tool
Top Architect
Top Architect
Dec 4, 2020 · Artificial Intelligence

Java-based ID Card OCR Project Using OpenCV, JavaCPP, and Tess4J

This article introduces a Java OCR project for ID cards that integrates OpenCV, JavaCPP, and Tess4J to perform image preprocessing, region cropping, and character recognition without requiring OpenCV installation, and details its features, encountered issues, system requirements, updates, and source repository.

Computer VisionID CardJava
0 likes · 4 min read
Java-based ID Card OCR Project Using OpenCV, JavaCPP, and Tess4J
New Oriental Technology
New Oriental Technology
Nov 23, 2020 · Artificial Intelligence

A Seq2Seq Deep Learning Approach for Recognizing Mathematical Formulas in Images

This article presents a deep‑learning Seq2Seq model that converts images of mathematical formulas—including matrices, equations, fractions, and radicals—into LaTeX sequences with over 95% accuracy, detailing data preparation, LaTeX normalization, model architecture, training, inference, and post‑processing techniques.

Deep LearningFormula RecognitionLaTeX
0 likes · 9 min read
A Seq2Seq Deep Learning Approach for Recognizing Mathematical Formulas in Images
Java Captain
Java Captain
Nov 9, 2020 · Artificial Intelligence

ID Card Number Recognition Project Using JavaCV, OpenCV, and Tess4J

This article introduces a Java-based ID card number recognition project that integrates OpenCV, Tess4J, and JavaCPP to perform OCR without prior training, outlines the encountered library linking issue, lists required software, and details recent updates such as chunked uploads and OpenCV version upgrade.

ID CardJavaOCR
0 likes · 3 min read
ID Card Number Recognition Project Using JavaCV, OpenCV, and Tess4J
DataFunTalk
DataFunTalk
Sep 23, 2020 · Artificial Intelligence

PaddleOCR: 2020’s Outstanding Open‑Source OCR Suite with a 3.5 MB Ultra‑Light Model

PaddleOCR, the 2020 breakthrough in open‑source OCR, offers ultra‑light 3.5 MB multilingual models, high F1‑score performance across diverse scenarios, easy installation via pip, comprehensive documentation, custom training support, and deployment options for both server and mobile platforms, all backed by detailed benchmarks and code examples.

OCRPaddleOCRPython
0 likes · 8 min read
PaddleOCR: 2020’s Outstanding Open‑Source OCR Suite with a 3.5 MB Ultra‑Light Model
Amap Tech
Amap Tech
Jul 30, 2020 · Artificial Intelligence

Evolution and Practice of Scene Text Recognition Technology in Amap Map Data Production

Amap uses advanced scene text recognition combining detection and recognition modules, deep learning, data synthesis, and result fusion to automate map data production, achieving state-of-the-art performance and automating the majority of POI and road updates, significantly reducing labor costs.

Computer VisionDeep LearningOCR
0 likes · 18 min read
Evolution and Practice of Scene Text Recognition Technology in Amap Map Data Production
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 30, 2020 · Artificial Intelligence

How Amap’s Scene Text Recognition Powers Accurate Maps: Evolution and Future Challenges

This article explains how Amap leverages scene text recognition to automate map data production, detailing the evolution from traditional image algorithms to deep‑learning models, the current detection and recognition framework, performance results, and future research directions for handling blur, data scarcity, and semantic understanding.

AmapComputer VisionDeep Learning
0 likes · 19 min read
How Amap’s Scene Text Recognition Powers Accurate Maps: Evolution and Future Challenges
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 29, 2020 · Artificial Intelligence

How Gaode Maps Boosts Accuracy with Advanced Scene Text Recognition

This article explains how Gaode Maps leverages traditional and deep‑learning based scene text recognition techniques—including character detection, sequence models, data synthesis, and multi‑stage frameworks—to automate POI and road data production with high precision and speed.

Computer VisionDeep LearningOCR
0 likes · 20 min read
How Gaode Maps Boosts Accuracy with Advanced Scene Text Recognition
ITPUB
ITPUB
Jun 6, 2020 · Artificial Intelligence

How to Use the Open‑Source OCR Translator for Videos, Games, and PDFs

This guide explains how to set up and operate a free open‑source OCR‑based translator that captures on‑screen text from videos, games, or PDFs, registers the required Baidu AI API keys, configures translation sources, and demonstrates its performance on real content.

Baidu AIGitHubOCR
0 likes · 5 min read
How to Use the Open‑Source OCR Translator for Videos, Games, and PDFs
Programmer DD
Programmer DD
May 9, 2020 · Artificial Intelligence

ChineseOCR Lite: Ultra‑Lightweight OCR Engine for Vertical Chinese Text

ChineseOCR Lite is an open‑source, ultra‑lightweight OCR solution that supports vertical Chinese text, runs on Linux/macOS via ncnn inference, and packs detection, recognition, and angle classification models into a total of just 17 MB, offering fast and accurate scene‑text processing.

Chinese OCRComputer VisionOCR
0 likes · 4 min read
ChineseOCR Lite: Ultra‑Lightweight OCR Engine for Vertical Chinese Text
360 Quality & Efficiency
360 Quality & Efficiency
Jan 2, 2020 · Mobile Development

Common Element Locating Strategies in Appium for Mobile Automation

This article introduces Appium's basic element locating techniques—including id, name, class name, XPath, UIAutomator, and relative coordinates—explains how to handle non‑unique elements through iteration or OCR, and demonstrates image‑based locating with OpenCV and screenshot code examples.

AppiumElement LocatingMobile Automation
0 likes · 5 min read
Common Element Locating Strategies in Appium for Mobile Automation
21CTO
21CTO
Sep 28, 2019 · Backend Development

Cracking Dazhong Dianping’s CSS Encryption: A Step‑by‑Step Web Scraping Guide

This article walks through the challenges of scraping Dazhong Dianping, explains how the site hides numeric data with custom CSS fonts, and provides a complete Python workflow—including HTTP requests, font extraction, glyph rendering, and OCR—to decode and retrieve the protected information.

CSS encryptionOCRPython
0 likes · 13 min read
Cracking Dazhong Dianping’s CSS Encryption: A Step‑by‑Step Web Scraping Guide
Tencent Cloud Developer
Tencent Cloud Developer
Sep 19, 2019 · Artificial Intelligence

Inside Tencent Cloud OCR: Architecture, Performance, and Integration Guide

The article provides a comprehensive overview of Tencent Cloud’s OCR platform, detailing its service architecture, product capabilities, integration methods, performance metrics, engineering improvements, testing automation, and operational considerations, offering developers practical insights into building and deploying OCR solutions on the cloud.

Cloud AIComputer VisionOCR
0 likes · 10 min read
Inside Tencent Cloud OCR: Architecture, Performance, and Integration Guide