Artificial Intelligence 12 min read

Detect Front‑End UI Components with Pipcook: A Complete Object‑Detection Guide

This tutorial walks you through using Pipcook to train an object‑detection model that automatically identifies and locates front‑end UI components in screenshots, covering data preparation in Pascal VOC format, pipeline configuration, model training, and inference with sample code.

Alibaba Terminal Technology

Jul 1, 2020

Detect Front‑End UI Components with Pipcook: A Complete Object‑Detection Guide

Background

In front‑end development you may have many UI screenshots and need an automatic way to recognize which components (buttons, switches, inputs, etc.) appear and where they are located. This task is known as object detection in deep learning.

Scenario Example

An example image contains multiple components such as buttons, switches, and input fields. After training, the model predicts the following JSON:

{
  "boxes": [
    [83, 31, 146, 71],
    [210, 48, 256, 78],
    [403, 30, 653, 72],
    [717, 41, 966, 83]
  ],
  "classes": [0, 1, 2, 2],
  "scores": [0.95, 0.93, 0.96, 0.99]
}

The corresponding label map is:

{
  "button": 0,
  "switch": 1,
  "input": 2
}

Explanation:

boxes : coordinates of each detected component (xmin, ymin, xmax, ymax).

classes : numeric class IDs that map to component types via the label map.

scores : confidence scores; only results above a chosen threshold are kept.

Data Preparation

Object‑detection models require datasets in a standard format. This tutorial uses the Pascal VOC format, which stores each image together with an XML annotation file. A typical directory layout is:

train/
  1.jpg
  1.xml
  2.jpg
  2.xml
  ...
validation/
  1.jpg
  1.xml
  ...
test/
  1.jpg
  1.xml
  ...

Each XML file contains fields such as <folder>, <filename>, <size>, and multiple <object> entries with <name> (component type) and <bndbox> (position).

Start Training

With the dataset ready, create a Pipcook pipeline JSON that strings together the required plugins:

{
  "plugins": {
    "dataCollect": {
      "package": "@pipcook/plugins-object-detection-pascalvoc-data-collect",
      "params": { "url": "http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/component-recognition-detection/component-recognition-detection.zip" }
    },
    "dataAccess": { "package": "@pipcook/plugins-coco-data-access" },
    "modelDefine": { "package": "@pipcook/plugins-detectron-fasterrcnn-model-define" },
    "modelTrain": { "package": "@pipcook/plugins-detectron-model-train", "params": { "steps": 100000 } },
    "modelEvaluate": { "package": "@pipcook/plugins-detectron-model-evaluate" }
  }
}

Run the pipeline on a machine with an NVIDIA GPU and CUDA 10.2:

pipcook run object-detection.json --verbose --tuna

Training logs show loss decreasing, e.g.:

[06/28 12:28:32] iter: 100000 total_loss: 0.032 loss_cls: 0.122 ...

After training, Pipcook generates an output npm package. Install dependencies and run inference:

cd output
BOA_TUNA=1 npm install
const predict = require('./output');
(async () => {
  const result = await predict('./test.jpg');
  console.log(result);
})();

The prediction result contains boxes, classes, and scores as described earlier.

Creating Your Own Dataset

To build a custom dataset, follow three steps:

Collect images : Gather raw UI screenshots without annotations.

Annotate : Use tools such as labelImg to draw bounding boxes and assign component labels. Example screenshot of labelImg is shown below.

Train : Organize the annotated files into the Pascal VOC folder structure and run the Pipcook pipeline.

Summary

You have learned how to detect multiple front‑end components in an image using Pipcook, from data preparation to model training and inference. The next tutorial will explore image style transfer with Pipcook, such as converting photos to oil‑painting style.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning object detection Pipeline Pipcook frontend components Pascal VOC

Written by

Alibaba Terminal Technology

Official public account of Alibaba Terminal

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.