Efficient Scene Text Detection Framework with Feature Pyramid and Expanded High-Level Feature Maps
The paper presents an efficient scene‑text detector that expands high‑level SSD feature maps and integrates a feature‑pyramid network, using direction‑aware segment‑and‑link predictions to reconstruct arbitrarily long, rotated text, achieving higher recall and precision with real‑time speed and outperforming recent methods on ICDAR benchmarks and a menu‑recognition test.
Scene text detection is crucial for many applications but remains challenging due to large variations in aspect ratio, scale, and orientation.
This work proposes an efficient detection framework that combines an expanded high‑level feature map with a feature‑pyramid network (FPN) built on top of an SSD backbone. Text lines are decomposed into small, direction‑aware segments; a 8‑neighbor link predicts connections between segments, allowing reconstruction of arbitrarily long and rotated text.
Key components:
Interval sampling to enlarge high‑level feature maps, preserving resolution for small texts.
Fusion of deep and shallow features to construct a multi‑level pyramid (conv4_3_f, fc7_f, conv6_2_f, …) with 256‑dimensional channels.
Segment‑and‑link prediction on each pyramid level, modeling eight possible neighbor relations.
Geometric post‑processing that fits a line to linked segments and derives final bounding boxes.
Experiments on ICDAR2013 and ICDAR2015 show that expanding high‑level maps improves recall, while adding the pyramid further boosts precision. Compared with TextBoxes++, PixelLink and other state‑of‑the‑art methods, the proposed approach achieves a favorable trade‑off between speed (FPS) and accuracy.
The system is also deployed in a real‑world menu‑recognition scenario, where it outperforms SegLink by about 5 % on a 500‑image test set.
Future work will explore pixel‑level segmentation (inspired by PixelLink) and joint detection‑segmentation architectures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
