Automated Visual Assertion for Search Template Rendering Using Logical Block Modeling
The article describes a visual‑based automation framework that extracts logical blocks and relative positions from template screenshots, builds a mathematical model, and uses an expert‑system approach to automatically detect layout anomalies across diverse search result templates, reducing manual testing effort.
Background The search template testing team has long relied on manual testing to spot style issues such as blank pages, widget occlusion, misalignment, and missing elements, which becomes costly across many device models, OS versions, and apps, prompting the need for an automated solution.
Traditional assertion methods compare a test screenshot with a baseline image using pixel‑wise DIFF, but this approach struggles when reference images are unavailable, suffers from minor pixel errors, and cannot handle dynamic effects, leading to high manual verification costs.
A more generic visual assertion method is required to cover all testing scenarios and reliably detect common layout problems.
Idea The natural idea is to let a computer automatically recognize human‑visible style defects, for example by training a deep‑learning model on large numbers of positive and negative samples. However, a single template (e.g., Baidu Baike) can have many distinct visual presentations (with or without images), making it hard to abstract features for classification.
If we model only the visual structure and layout while ignoring textual differences, we can converge the various presentations of the same template into a limited set of structured models, forming an expert system that can automatically predict and classify anomalies. The two most important visual elements are logical blocks and their positions. Logical blocks are the basic elements whose addition or loss causes layout issues, while the relative positions between blocks define the overall structure; changes in these positions manifest as occlusion or misalignment.
By modeling templates based on logical blocks and positional information, a single model can represent a specific style of a template. Training with many positive samples builds a knowledge base; matching a new screenshot against this knowledge base yields accurate assertions.
Solution
The approach abstracts a template’s visual structure into a mathematical model using logical blocks and relative positions, then applies an expert‑system style automatic classification and assertion.
The image modeling pipeline includes several steps:
a) Element Extraction – Using background‑color based contour detection, the image is scanned horizontally and vertically to segment regions that share the same color as the background. Multiple scan passes produce finer‑grained segments, yielding the contours of visual elements.
b) Logical Block Merging
Element Classification – After extraction, each block is classified as text or image. Image blocks usually have higher noise values; text blocks are smaller with a long‑to‑short ratio far greater than 1. These heuristics separate text from images.
Text Block Merging – Dispersed text fragments belonging to the same logical block are merged based on three rules: (1) horizontal/vertical distance not exceeding a single‑line height, (2) consistent primary color (e.g., black titles vs. gray body), and (3) special handling such as ignoring red highlights in body text and not merging across images.
Horizontal/vertical merging is performed separately according to the orientation of the text fragments.
After merging, the resulting text blocks are visualized as shown below:
c) Icon Recognition – Icons that indicate business functions or navigation (e.g., lightning symbol for MIP pages, right‑arrow for links) are detected by sliding a template of the original‑size icon over the merged text area and computing similarity using the normalized cross‑correlation method (CV_TM_CCORR_NORMED). Matches above a threshold are identified as icons.
d) Template Modeling – Using the top‑left coordinates of each logical block, the horizontal and vertical distances between blocks are computed. These distances become edge weights in a fully‑connected directed graph. Sorting nodes by their y‑coordinate yields an N×N matrix M where each entry stores the normalized x‑ and y‑differences (e.g., M[0][1] = [(x0‑x1)/img_width, (y0‑y1)/img_height]). This three‑dimensional array serves as the mathematical model of the image.
Application In production, each template may have multiple enumerated styles. By recalling large volumes of real traffic and applying the modeling pipeline, multiple models are generated. Models with identical node counts are compared via their matrix entries; those with ≤5% error are merged, forming a converged model set that becomes the initial knowledge base. During testing, a new screenshot is modeled and matched against the knowledge base to determine if the layout is abnormal.
For non‑deterministic queries, manual verification can continuously enrich the knowledge base; for trained queries, the system can accurately identify anomalies, enabling precise functional testing of templates (e.g., Baike template).
Benefits
Solves the problem of automatic visual assertion for templates, providing a foundation for automated functional verification.
Achieves full coverage of intra‑template styles, extending test coverage from template‑level to sub‑style level, thereby improving template quality and defect detection.
Completed training for the top‑25 online templates with element segmentation accuracy >95%, generating 170 functional cases for offline regression and successfully recalling multiple bad cases.
Author
Zhou Qichao – Senior Test Development Engineer at Baidu, specializing in frontend automation testing and applying image‑based techniques to UI testing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
