How AI Powers Automated Visual UI Inspection: Inside the MonaLisa Platform

An in‑depth look at MonaLisa, an AI‑driven visual UI inspection platform that automates pixel‑level checks by converting design files to HTML, matching DOM structures, and leveraging OpenCV and MobileNetV2 to generate reports, reduce manual rework, and streamline front‑end development workflows.

Alipay Experience Technology
Alipay Experience Technology
Alipay Experience Technology
How AI Powers Automated Visual UI Inspection: Inside the MonaLisa Platform

Introduction

The speaker, Shen Jiang from NetBank Mobile Channel, presents MonaLisa , an intelligent visual inspection solution that uses algorithms to automate UI checks.

Why Visual Inspection Matters

Manual visual checks cause repeated back‑and‑forth between developers and designers, consuming days of effort and risking brand inconsistency, user confusion, and potential churn.

Exploring New Solutions

Initial attempts at automatic code modification proved too risky for a banking environment, so the team focused on automated annotation.

Image‑Based Smart Inspection

The overall approach is called the "image‑algorithm based smart inspection solution" and consists of two main strategies:

Solution 1: Pixel‑diff using computer vision (ImageDiff) – compares rendered screenshots but lacks insight into specific attribute differences.

Solution 2: DOM‑level pixel inspection – matches design‑derived HTML with deployed HTML at the DOM node level.

DOM‑Based Inspection Workflow

Convert design mockups (Sketch) to HTML.

Deploy code to obtain the live HTML.

Cross‑match DOM nodes of the two HTML trees.

Compare styles (font, size, color, position) for matched nodes.

This yields a detailed inspection report.

Precise Design‑to‑HTML Parsing

The "precise design parsing" pipeline cleans Sketch layers, maps macOS view properties to DOM attributes, and generates HTML tags (div for text, img for images).

Implementation Details

Sketch plugins are written in Objective‑C (CocoaScript API is limited). The plugin is executed via

/Applications/Sketch.app/Contents/MacOS/sketchtool \
  run /monalisa/plugin/bkcodego.sketchplugin \
  bkcodego.id.detail-parser \
  --without-activat=YES \
  --new-instance=YES \
  --context="{\"file\":\"/data/test.sketch\", \"output\":\"/data/output/v1\"}"

Clustered Service Architecture

Parsing tasks are dispatched through a message queue to a MacMini cluster, handling peak loads and non‑standard network environments.

Node Matching with OpenCV + MobileNetV2

Metadata (HTML URLs, rendered DOM trees, screenshots) is collected via headless Chrome. After box correction and merging overlapping elements, a two‑stage matching process runs:

Hash‑based similarity filtering (average hash, thresholds 0.5 and 0.99).

Cross‑traversal of DOM trees to compare shape, text, image similarity, followed by style comparison (font, size, color, position).

The final output is a visual inspection report.

Platformization

The capabilities are exposed through the MonaLisa platform, allowing developers to trigger automated checks without writing scripts themselves, cutting inspection time from days to a single day for medium‑large projects.

Future Directions

Support for Figma as an alternative design source.

Migrate parsing services from macOS‑only MacMini clusters to standard Linux clusters.

Extend inspection to online visual monitoring and multi‑state component checks.

Open‑source the parsing and inspection services.

Conclusion

The MonaLisa platform demonstrates how AI‑driven image processing and DOM analysis can dramatically improve UI visual quality assurance, reduce manual effort, and pave the way for scalable, automated front‑end validation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

frontendAIAutomationImage Processingdesign-to-codevisual inspection
Alipay Experience Technology
Written by

Alipay Experience Technology

Exploring ultimate user experience and best engineering practices

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.