AI‑Based Structuring of Medical Examination Reports: OCR, Text Detection, Classification, and NER
This article describes how a Chinese online medical platform tackled the large‑scale extraction and structuring of hospital report images by combining OCR, deep‑learning text‑region detection, fast text classification, and advanced NER techniques, detailing challenges, algorithm choices, performance results, and remaining issues.
Project Background and Challenges
The online medical service receives tens of thousands of report images daily, which contain valuable clinical data but are stored as unstructured pictures, preventing indexing, retrieval, and efficient doctor reference. The goal is to automatically read, classify, and convert these reports into structured tables.
Technical Overview
The solution relies on artificial‑intelligence techniques, specifically image processing (OCR and text‑region detection) and natural‑language processing (text classification and named‑entity recognition). The pipeline consists of five stages: text‑region detection, OCR, report‑type classification, NER, and structured content extraction.
Text‑Region Detection Module
Initially a YOLO‑based detector was used, but it struggled with isolated symbols such as ‘+’ and ‘‑’ that carry clinical meaning. The team switched to the segmentation‑based CRAFT algorithm, which better captures small characters. To avoid costly manual annotation, a synthetic report generator was created to produce training data with realistic backgrounds.
Text Recognition Module
CRNN was adopted for OCR after testing attention‑based DAN, which proved too heavy for production. Experiments on the internal “Medical Report Text Line Dataset v1.0” showed the in‑house OCR outperformed Baidu and Tencent cloud APIs, though accuracy remained below 85%.
Report Classification Module
After discarding BERT for speed reasons, FastText was selected for classifying reports (e.g., blood routine, liver function). Trained on a manually filtered set of ~5,000 images, FastText achieved 92.2% accuracy, surpassing the earlier keyword‑weighting method.
Named Entity Recognition Module
The NER task extracts fields such as “test name”, “value”, and “reference range”. Various models were evaluated: CRF, BiLSTM‑CRF, BERT‑CRF, and a custom BERT‑roll‑back (BERT‑rb) that splits long reports before applying CRF. The BERT‑rb‑CRF model handled reports longer than 512 tokens with performance comparable to standard BERT‑CRF.
Structured Content Extraction
Extracted entities are assembled into tabular form by locating the test name and then searching the same or next line for its value and range.
Remaining Issues
Challenges persist: CRAFT lacks stamp detection, occasional mis‑recognition of charts, OCR accuracy still lags behind premium cloud services, classification accuracy could be raised above 95%, and NER does not yet capture units. Further data collection and model refinement are planned.
References
Key papers cited include YOLO, CRNN, CRAFT, FastText, BERT, and various OCR and NER studies.
HaoDF Tech Team
HaoDF Online tech practice and sharing—join us to discuss and help create quality healthcare through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.