Artificial Intelligence 8 min read

Weex Page Mocking with Puppeteer for Large‑Scale UI Sample Generation

To solve the shortage of annotated UI data for UI2CODE, the team uses Puppeteer to load Weex pages, traverses the DOM to gather text and image elements, records their styles and positions, screenshots the page, and repeatedly swaps content, automatically generating thousands of realistic, labeled UI samples from a few hundred templates, greatly cutting manual labeling effort and boosting model accuracy.

Xianyu Technology

Aug 7, 2019

Weex Page Mocking with Puppeteer for Large‑Scale UI Sample Generation

In the UI2CODE project, deep learning is applied to detect UI components on design images. Training such models requires a massive amount of labeled data, but real UI samples are scarce and costly to annotate.

To overcome data scarcity, the team generates synthetic samples by mocking Weex pages. Weex provides a complete DOM tree, allowing easy replacement of text and image nodes to create diverse samples while preserving realistic layout.

The workflow uses Google Puppeteer to launch a headless Chrome, load a Weex page, emulate an iPhone 6 environment, and then traverse the DOM to collect target controls (text, image, shape). Collected nodes are filtered based on style and class names, and their positions are recorded.

const browser = await puppeteer.launch({
    headless: true
});

const page = await browser.newPage();
await page.goto(nowUrls, {waitUntil: ['load','domcontentloaded','networkidle0']});

await page.emulate({
    'name': 'iPhone 6',
    'userAgent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1',
    'viewport': {
        width: 750,
        height: 1334,
        deviceScaleFactor: 1,
        isMobile: true,
        hasTouch: true,
        isLandscape: false
    }
});

let d_root = document.querySelectorAll('.weex-root');
let nodes_root = [];
collectChildren(d_root, nodes_root);

function collectChildren(d, _nodes) {
    for (var i = 0, l = d.length; i < l; i++) {
        let hasPushed = false;
        if (d[i].nodeType !== 1 && d[i].nodeType !== 3) {
            continue;
        }
        if (d[i].style) {
            let backgrounColorValue = d[i].style['background-color'];
            if (backgrounColorValue && backgrounColorValue !== 'rgb(255, 255, 255)' && backgrounColorValue !== 'rgb(0, 0, 0)' && backgrounColorValue !== 'transparent') {
                _nodes.push(d[i]);
                hasPushed = true;
            }
        }
        if (d[i].hasChildNodes()) {
            collectChildren(d[i].childNodes, _nodes);
        } else {
            let _node = d[i];
            let _className = _node.className;
            if (!_className && _node.nodeName === '#text') {
                _className = _node.parentNode.className;
            }
            if (_className && !hasPushed) {
                if (_className.indexOf('weex-text') > -1 || _className.indexOf('weex-image') > -1) {
                    _nodes.push(d[i]);
                }
            }
        }
    }
    return _nodes;
}

function getRealyStyle(node, attrKey) {
    let wvStyle = window.getComputedStyle(node);
    if (node[attrKey] && node[attrKey] !== '') {
        return node[attrKey];
    } else {
        return wvStyle[attrKey];
    }
}

function getViewPosition(node) {
    const {top, left, bottom, right} = node.getBoundingClientRect();
    return {
        "y": top,
        "x": left,
        "height": bottom - top,
        "width": right - left
    };
}

await page.screenshot({
    path: pngName,
    fullPage: true
});

After extracting element attributes and positions, the page is screenshoted to obtain the image. Mask layers (pop‑up dialogs) are filtered out so that only top‑layer controls are annotated. By repeatedly substituting text and image nodes, thousands of labeled samples can be generated from a few hundred Weex pages, dramatically reducing manual annotation effort and improving model accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data augmentation Puppeteer UI automation Synthetic Samples Weex

Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.