Artificial Intelligence 18 min read

Practical Applications of OCR Technology in 58 Information Security Scenarios: Layout Analysis

This article presents the practical deployment of OCR technology within 58’s information‑security workflows, focusing on layout‑analysis techniques for document and credential recognition, detailing rule‑based, template‑matching, object‑detection, and image‑segmentation methods, their implementation steps, experimental results, advantages, limitations, and future directions.

58 Tech
58 Tech
58 Tech
Practical Applications of OCR Technology in 58 Information Security Scenarios: Layout Analysis

The article introduces OCR (Optical Character Recognition) technology applied in 58’s information‑security environment, emphasizing the critical role of layout analysis in improving text detection, recognition, and structured information extraction for various certificates and tickets such as identity cards, driving licenses, and property deeds.

Background : 58’s business spans many scenarios that require recognition of documents and images. By leveraging OCR, the company can automate identity verification, qualification checks, and form‑filling, reducing labor costs and enhancing user experience. The information‑security team has already built OCR capabilities for over ten document types.

OCR and Layout Analysis : General OCR extracts text lines without structural information. Layout analysis adds automatic detection of image, text, and table regions, enabling precise field extraction and higher recognition accuracy.

Layout‑Analysis Practices :

Rule‑Based Layout Analysis : Uses generic text detection and line recognition results, then applies handcrafted rules (e.g., field length, numeric patterns, positional relationships) to extract key information. Advantages include short development cycles (e.g., 2‑day incubation for travel‑card recognition). Limitations involve dependence on generic OCR accuracy and complex rule logic for noisy or repetitive content.

Template‑Matching Layout Analysis : Aligns a test image with a pre‑annotated template to locate target fields. Suitable for fixed‑format documents (ID cards, driving licenses). The workflow includes template selection, annotation of field positions, feature‑point extraction (SIFT/SURF), matching, perspective transformation, text‑line detection (DBNET, PANNET), recognition (CRNN), and post‑processing. It offers rapid development but suffers from poor robustness to distortion and high computational cost.

Object‑Detection‑Based Layout Analysis : Detects baseline regions of each field using detectors such as SSD, YOLO, or RetinaNet, then computes image rotation, defines candidate regions, refines them via gray‑projection or regression, and finally performs text recognition and correction. This method achieves high recall and accuracy but requires multiple models, longer training time, and can be sensitive to slight tilts.

Image‑Segmentation‑Based Layout Analysis : Employs a segmentation network (enhanced PANNET with a multi‑class branch) to predict pixel‑level field masks, directly yielding target regions. After affine correction, text lines are recognized and corrected. This approach delivers the highest precision, simplifies the pipeline, and handles tilted images well, though field adjacency may cause occasional mask merging.

Experimental results on various document sets (travel cards, driving licenses, temporary vehicle plates, etc.) show that the segmentation‑based method consistently achieves the best recall (≈98‑99%) and accuracy (≈99‑100%) with moderate processing time (≈200 ms) and a development cycle of about two weeks.

Comparison Table (recall, accuracy, processing time, development weeks) highlights that rule‑based methods are fastest to implement but least accurate, while segmentation offers the best trade‑off between precision and effort.

Conclusion and Outlook : The paper summarizes the evolution of layout‑analysis techniques within 58’s OCR pipeline, noting that rule‑based methods are ideal for rapid prototyping, template matching for strictly fixed formats, object detection for moderate flexibility, and segmentation for the highest accuracy. Future work includes joint training of detection and segmentation to mitigate field‑clipping and over‑segmentation, and exploring graph‑network approaches for more complex layout tasks.

References :

Wenhai Wang, “Efficient and Accurate Arbitrary‑Shaped Text Detection with Pixel Aggregation Network”, arXiv:1908.05900.

Minghui Liao, “Real‑time Scene Text Detection with Differentiable Binarization”, arXiv:1911.08947.

Zbigniew Wojna, “Attention‑based Extraction of Structured Information from Street View Imagery”, arXiv:1704.03549.

Baoguang Shi, “An End‑to‑End Trainable Neural Network for Image‑based Sequence Recognition and Its Application to Scene Text Recognition”, arXiv:1507.05717.

Machine Learningcomputer visionOCRlayout analysisDocument Recognition
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.