Frontend Development 20 min read

How Frontend Code Is Automatically Generated: Inside Alibaba’s AI‑Powered D2C Pipeline

This article explains Alibaba's front‑end intelligent project that automatically generated 79.34% of the Double‑11 UI code, detailing why images are used as input, the layered image‑processing pipeline, background and foreground analysis, traditional versus deep‑learning methods, fusion techniques, evaluation results, and real‑world deployments.

Alibaba Terminal Technology

Dec 5, 2019

How Frontend Code Is Automatically Generated: Inside Alibaba’s AI‑Powered D2C Pipeline

Overview

As one of the four technical directions of Alibaba’s Front‑End Committee, the intelligent front‑end project passed the 2019 Double‑11 test, automatically generating 79.34% of the code for the Tmall‑Taobao Double‑11 venue. This article shares the technical ideas and challenges behind the automatic front‑end code generation.

Why Use Images as Input

Design drafts are hard to convert to UI code manually; using images from Sketch or Photoshop provides deterministic information while keeping the pipeline independent of upstream tools. Images also allow handling of layout types that do not exist in the design (e.g., listview, gridview) and support broader scenarios such as automated testing.

Layer Processing Layer

The D2C layer‑processing capability identifies element categories and extracts styles, feeding the subsequent layout algorithm layer.

Layout Analysis

Layout analysis separates foreground UI fragments from background using machine‑vision algorithms. Background analysis extracts color, gradient direction and connected regions; foreground analysis uses deep‑learning to merge and recognize GUI fragments.

Background analysis: machine‑vision extracts background color, gradient direction, and connected regions. Foreground analysis: deep‑learning merges and recognizes GUI fragments.

Background Analysis

Step 1: Detect background blocks with Sobel, Laplacian, Canny edge detectors and compute gradient direction. The discrete Laplacian template is:

Step 2: Use flood‑fill (water‑fill) to remove noise from gradient backgrounds.

def fill_color_diffuse_water_from_img(task_out_dir, image, x, y, thres_up=(10,10,10), thres_down=(10,10,10), fill_color=(255,255,255)):
    # get image height and width
    h, w = image.shape[:2]
    # create a mask of size (h+2, w+2)
    mask = np.zeros([h + 2, w + 2], np.uint8)
    # flood fill
    cv2.floodFill(image, mask, (x, y), fill_color, thres_down, thres_up, cv2.FLOODFILL_FIXED_RANGE)
    cv2.imwrite(task_out_dir + "/ui/tmp2.png", image)
    return image, mask

The resulting image after background processing is shown below.

Foreground Analysis

Foreground analysis uses connected‑component analysis to avoid fragmenting components, then machine‑learning to classify component types and merge fragments iteratively until no small features remain. An example of a complete item extracted from a waterfall‑flow layout is shown.

Traditional vs. Deep‑Learning Methods

Traditional edge‑gradient or connected‑component methods have high precision and speed but low recall. One‑stage detectors (YOLO, SSD) have high recall but lower localization accuracy; two‑stage detectors (Faster R‑CNN) achieve higher mAP at the cost of speed. A fusion of both methods can obtain high precision, recall and localization.

Run traditional and deep‑learning pipelines in parallel to obtain trbox and dlbox.

Filter trbox by IOU with dlbox > 0.8.

Filter dlbox by IOU with filtered trbox > 0.8.

Adjust dlbox edges toward the nearest straight line within a pixel threshold, without crossing trbox edges.

Output the fused boxes.

Evaluation

On 50 Xianyu waterfall‑flow screenshots (96 cards), traditional methods detected 65 cards, deep‑learning 97, and the fused approach 98, achieving higher precision, recall and IOU.

Complex Background Content Extraction

Complex background extraction aims to retrieve specific content (text, overlay layers) from noisy backgrounds. Traditional image processing struggles with accuracy and recall; semantic segmentation cannot recover occluded pixels. The proposed solution combines object‑detection for content recall and a SR‑GAN to restore foreground elements.

Why Use GAN?

SR‑GAN preserves high‑frequency details via a feature‑map loss, reduces false detections with adversarial loss, and can restore pixel values of transparent overlays—something semantic segmentation cannot do.

Training Pipeline

Business Applications

The method is deployed in the imgcook image pipeline (73%‑92% accuracy) and Alibaba’s automated testing for Double‑11 modules (over 97% accuracy and recall).

Future Work

Plans include richer layout recognition (listview, gridview, waterfall), improving accuracy for small objects with FPN and Cascade, expanding to more pages, and building an image‑sample generator to lower integration cost.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

frontend code generation machine learning Image processing Layout Analysis

Written by

Alibaba Terminal Technology

Official public account of Alibaba Terminal

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.