How AI Powers Automated Visual UI Inspection: Inside the MonaLisa Platform
An in‑depth look at MonaLisa, an AI‑driven visual UI inspection platform that automates pixel‑level checks by converting design files to HTML, matching DOM structures, and leveraging OpenCV and MobileNetV2 to generate reports, reduce manual rework, and streamline front‑end development workflows.
Introduction
The speaker, Shen Jiang from NetBank Mobile Channel, presents MonaLisa , an intelligent visual inspection solution that uses algorithms to automate UI checks.
Why Visual Inspection Matters
Manual visual checks cause repeated back‑and‑forth between developers and designers, consuming days of effort and risking brand inconsistency, user confusion, and potential churn.
Exploring New Solutions
Initial attempts at automatic code modification proved too risky for a banking environment, so the team focused on automated annotation.
Image‑Based Smart Inspection
The overall approach is called the "image‑algorithm based smart inspection solution" and consists of two main strategies:
Solution 1: Pixel‑diff using computer vision (ImageDiff) – compares rendered screenshots but lacks insight into specific attribute differences.
Solution 2: DOM‑level pixel inspection – matches design‑derived HTML with deployed HTML at the DOM node level.
DOM‑Based Inspection Workflow
Convert design mockups (Sketch) to HTML.
Deploy code to obtain the live HTML.
Cross‑match DOM nodes of the two HTML trees.
Compare styles (font, size, color, position) for matched nodes.
This yields a detailed inspection report.
Precise Design‑to‑HTML Parsing
The "precise design parsing" pipeline cleans Sketch layers, maps macOS view properties to DOM attributes, and generates HTML tags (div for text, img for images).
Implementation Details
Sketch plugins are written in Objective‑C (CocoaScript API is limited). The plugin is executed via
/Applications/Sketch.app/Contents/MacOS/sketchtool \
run /monalisa/plugin/bkcodego.sketchplugin \
bkcodego.id.detail-parser \
--without-activat=YES \
--new-instance=YES \
--context="{\"file\":\"/data/test.sketch\", \"output\":\"/data/output/v1\"}"Clustered Service Architecture
Parsing tasks are dispatched through a message queue to a MacMini cluster, handling peak loads and non‑standard network environments.
Node Matching with OpenCV + MobileNetV2
Metadata (HTML URLs, rendered DOM trees, screenshots) is collected via headless Chrome. After box correction and merging overlapping elements, a two‑stage matching process runs:
Hash‑based similarity filtering (average hash, thresholds 0.5 and 0.99).
Cross‑traversal of DOM trees to compare shape, text, image similarity, followed by style comparison (font, size, color, position).
The final output is a visual inspection report.
Platformization
The capabilities are exposed through the MonaLisa platform, allowing developers to trigger automated checks without writing scripts themselves, cutting inspection time from days to a single day for medium‑large projects.
Future Directions
Support for Figma as an alternative design source.
Migrate parsing services from macOS‑only MacMini clusters to standard Linux clusters.
Extend inspection to online visual monitoring and multi‑state component checks.
Open‑source the parsing and inspection services.
Conclusion
The MonaLisa platform demonstrates how AI‑driven image processing and DOM analysis can dramatically improve UI visual quality assurance, reduce manual effort, and pave the way for scalable, automated front‑end validation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alipay Experience Technology
Exploring ultimate user experience and best engineering practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
