Artificial Intelligence 9 min read

Improving Small Object Detection for UI2CODE via Data Augmentation and Model Optimization

The study enhances UI2CODE’s ability to detect tiny UI components by augmenting training data with copied small objects, upgrading the detector from Faster RCNN to FPN and Cascade FPN, and refining box positions with smoothing and projection, achieving superior small‑object mAP/mAR and enabling broader UI parsing applications.

Xianyu Technology

Dec 11, 2019

Improving Small Object Detection for UI2CODE via Data Augmentation and Model Optimization

Background : In computer vision, detecting "small objects" (e.g., traffic lights in autonomous driving or early lesions in medical images) is critical for user experience and automation. UI2CODE, a tool from Xianyu Tech, parses UI elements from screenshots to generate code. Small UI components such as price tags or icons are often missed, leading to inaccurate code generation.

Challenges : According to COCO definitions, objects smaller than 32×32 pixels are small. The main difficulties are (1) class imbalance – small objects are scarce, biasing loss toward larger objects; (2) feature loss – pooling layers in deep networks discard fine‑grained details; (3) localization precision – a slight shift dramatically reduces IoU for small boxes.

Proposed Solutions :

1) Data Augmentation : Randomly copy and paste small objects into non‑overlapping positions, ensuring a 5‑pixel margin from image borders and applying up to 5% scale variation.

2) Model Optimization : Progressively upgrade the detector from Faster RCNN → Feature Pyramid Network (FPN) → Cascade FPN, leveraging multi‑scale features and staged IoU thresholds to improve recall and precision for small targets.

3) Position Correction : Apply Gaussian smoothing, adaptive local binarization, and horizontal/vertical projection on the binary mask to refine bounding boxes to pixel‑level accuracy.

Results : Experiments show that FPN and Cascade FPN outperform Faster RCNN on small‑object mAP/mAR across confidence thresholds (0.5–0.95). Cascade FPN yields the best overall metrics when the confidence threshold is 0.5.

Outlook : The small‑object detection pipeline forms the basis for UI element parsing and can be extended to other complex‑background analyses, generative up‑sampling, or class‑aware position refinement for higher precision.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision Data Augmentation FPN Model Optimization small object detection UI2Code

Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.