Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion
This article presents a hybrid pipeline combining traditional computer vision techniques and deep learning models to automatically extract and verify text background colors in UI automation screenshots, effectively addressing challenges like limited training data and complex borders to significantly reduce manual inspection costs while achieving high accuracy and robustness in production environments.
Background: As UI automation tasks generate massive volumes of screenshots, manual color assertion becomes highly inefficient and prone to visual fatigue. To address this, a hybrid pipeline combining image processing and computer vision algorithms was developed to accurately extract text and background colors for automated assertions, significantly reducing manual inspection costs.
Key Highlights: The proposed workflow integrates traditional image processing with deep learning to effectively separate foreground and background elements. It overcomes challenges related to high color accuracy requirements, large-scale data classification, and algorithm generalization, ensuring the model remains adaptable to new color categories without extensive retraining.
Performance Metrics: Evaluation metrics include detection rate and false positive rate. On the test set, the model classification achieved 92.82% accuracy, while color extraction reached 100%. The background color comparison algorithm demonstrated a 95.45% detection rate with a 0.00% false positive rate. In production deployment with Lark, the algorithm maintains over 98% accuracy across daily automated tasks.
Technical Challenges & Solutions: Initial challenges included handling image borders, recognizing special characters, and mitigating overfitting due to limited training samples. To address data scarcity, a Long Skip Connection architecture was implemented to merge low- and high-dimensional features, enhancing feature reusability. Training efficiency was improved using the One Cycle Learning Rate policy, which accelerates convergence in complex models.
Traditional Computer Vision Approach: Initial attempts utilized maximum connected component analysis in the HSV color space. While achieving ~95% accuracy for light backgrounds, performance dropped to ~57% for dark backgrounds due to issues with non-dominant background colors and varying border themes.
Improved Hybrid Algorithm: The refined pipeline first applies data augmentation and HSV conversion to generate multiple enhanced inputs. A deep convolutional network classifies regions into text, background, or other categories. If confidence is low, the system falls back to traditional CV methods. Extracted colors are then compared against standard palettes using normalized Manhattan distance to determine similarity.
Business Application & Future Plans: The algorithm is successfully deployed in Lark's UI component testing and e-commerce ad compliance checks, delivering substantial efficiency gains. Future improvements include integrating OCR for better classification, employing adversarial training to clarify decision boundaries, applying label smoothing to mitigate overconfidence, and refining color segmentation thresholds for enhanced precision.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ByteDance Terminal Technology
Official account of ByteDance Terminal Technology, sharing technical insights and team updates.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
