Applying Self-Attention Based Machine Learning Model to Design-to-Code Layout Prediction
Vivo’s frontend team built a self‑attention‑based machine‑learning model that predicts web‑page layout types (column, row, or absolute) from node dimensions and positions, solving parent‑child and sibling relationships for design‑to‑code conversion, achieving 99.4% accuracy using over 20 k labeled, crawled, and generated samples, while outlining further enhancements.
This article discusses how vivo's frontend team applied machine learning with self-attention mechanism to solve the design-to-code (D2C) conversion problem, specifically for web page layout prediction.
Background: Traditional D2C tools can export styles from design mockups but cannot determine web page layout. The team developed a D2C tool that uses ML to automatically predict layout patterns.
Problem Definition: Web layout prediction requires solving two problems: (1) parent-child relationships between nodes, and (2) positional relationships between sibling nodes (vertical, horizontal, or absolute positioning).
Why Self-Attention: Unlike RNN/LSTM which process sequentially, self-attention allows parallel computation of all nodes in a sequence, significantly improving training efficiency. Each node can compute contextual information simultaneously by relating to all other nodes through global attention weights.
Model Design: Input data includes node width, height, x, y coordinates. Output is layout type: 'col' (vertical), 'row' (horizontal), or 'absolute'. The model uses self-attention to generate contextual embeddings, then feedforward neural networks for final layout classification.
Data Preparation: Three data sources were used: (1) manually labeled design mockups (highest quality), (2) crawled real web pages with CSS analysis, and (3) automatically generated data via a web generator. Approximately 20,000+ samples were collected.
Results: The model achieved 99.4% accuracy in layout prediction.
Optimization Directions: (1) Handling element wrapping in lists, (2) Improving grouping for non-intersecting nodes like icons and text in grids, (3) General layout recognition for functional components, (4) Using reinforcement learning for better data generation.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.