Artificial Intelligence 19 min read

AI‑Based Structuring of Medical Examination Reports: OCR, Text Detection, Classification, and NER

This article describes how a Chinese online medical platform tackled the large‑scale extraction and structuring of hospital report images by combining OCR, deep‑learning text‑region detection, fast text classification, and advanced NER techniques, detailing challenges, algorithm choices, performance results, and remaining issues.

HaoDF Tech Team
HaoDF Tech Team
HaoDF Tech Team
AI‑Based Structuring of Medical Examination Reports: OCR, Text Detection, Classification, and NER

Project Background and Challenges

The online medical service receives tens of thousands of report images daily, which contain valuable clinical data but are stored as unstructured pictures, preventing indexing, retrieval, and efficient doctor reference. The goal is to automatically read, classify, and convert these reports into structured tables.

Technical Overview

The solution relies on artificial‑intelligence techniques, specifically image processing (OCR and text‑region detection) and natural‑language processing (text classification and named‑entity recognition). The pipeline consists of five stages: text‑region detection, OCR, report‑type classification, NER, and structured content extraction.

Text‑Region Detection Module

Initially a YOLO‑based detector was used, but it struggled with isolated symbols such as ‘+’ and ‘‑’ that carry clinical meaning. The team switched to the segmentation‑based CRAFT algorithm, which better captures small characters. To avoid costly manual annotation, a synthetic report generator was created to produce training data with realistic backgrounds.

Text Recognition Module

CRNN was adopted for OCR after testing attention‑based DAN, which proved too heavy for production. Experiments on the internal “Medical Report Text Line Dataset v1.0” showed the in‑house OCR outperformed Baidu and Tencent cloud APIs, though accuracy remained below 85%.

Report Classification Module

After discarding BERT for speed reasons, FastText was selected for classifying reports (e.g., blood routine, liver function). Trained on a manually filtered set of ~5,000 images, FastText achieved 92.2% accuracy, surpassing the earlier keyword‑weighting method.

Named Entity Recognition Module

The NER task extracts fields such as “test name”, “value”, and “reference range”. Various models were evaluated: CRF, BiLSTM‑CRF, BERT‑CRF, and a custom BERT‑roll‑back (BERT‑rb) that splits long reports before applying CRF. The BERT‑rb‑CRF model handled reports longer than 512 tokens with performance comparable to standard BERT‑CRF.

Structured Content Extraction

Extracted entities are assembled into tabular form by locating the test name and then searching the same or next line for its value and range.

Remaining Issues

Challenges persist: CRAFT lacks stamp detection, occasional mis‑recognition of charts, OCR accuracy still lags behind premium cloud services, classification accuracy could be raised above 95%, and NER does not yet capture units. Further data collection and model refinement are planned.

References

Key papers cited include YOLO, CRNN, CRAFT, FastText, BERT, and various OCR and NER studies.

AIOCRNLPtext detectionmedical imagingNER
HaoDF Tech Team
Written by

HaoDF Tech Team

HaoDF Online tech practice and sharing—join us to discuss and help create quality healthcare through technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.