Subtoken‑TranX: Front‑end JavaScript Code Generation for Industrial Use

Subtoken‑TranX, a joint effort by Alibaba’s DaTaobao team and Peking University, converts natural‑language requirements into JavaScript by training on a curated 2,489‑pair dataset, using subtoken‑level AST generation and task‑augmented variable semantics, achieving superior accuracy over standard TranX and Transformer models and now powering Alibaba’s BizCook front‑end production platform.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
Subtoken‑TranX: Front‑end JavaScript Code Generation for Industrial Use

Alibaba's DaTaobao technical team and Prof. Li Ge's group at Peking University jointly built the first code‑generation system adopted in an industrial front‑end development environment. The system, deployed on the BizCook platform, generates JavaScript code from natural‑language requirements.

Automatic code generation aims to translate natural logic into formal code. Existing deep‑learning models, even large Transformers, perform poorly on this task, especially with limited paired data. The authors constructed a high‑quality dataset of 2,489 JavaScript expression pairs, categorized into String‑Template Expressions (STE), OR‑Logic Expressions (OLE), Conditional Expressions (CE), and Data‑Processing Expressions (DPE).

Data preprocessing includes code normalization via Esprima/Escodegen, placeholder substitution for string literals, and simplification of member accesses. To mitigate data scarcity, a task‑augmentation strategy incorporates a variable‑semantic table, providing auxiliary supervision for variable naming.

The core model, Subtoken‑TranX, extends the TranX AST‑based generator to operate at the subtoken level, using a special <EOT> token to mark the end of subtoken sequences. This reduces vocabulary size and over‑fitting on small datasets.

Experiments on the BizCook test set show that Subtoken‑TranX outperforms the original TranX and standard Transformer models, both with and without task augmentation. Task‑augmented models achieve higher accuracy, edit similarity, and F1 scores, especially in variable usage. The approach meets the performance requirements for real‑world deployment and has been adopted in Alibaba’s large‑scale promotion platform.

Overall, the study demonstrates that combining task augmentation with a subtoken‑level AST generator effectively addresses data‑limited code‑generation scenarios in front‑end development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaScriptmachine learningASTsubtoken
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.