hyengine: Unifying Mobile Script Execution with JIT and Multi‑Language Support

This article introduces hyengine, a lightweight, high‑performance engine designed to run multiple scripting languages such as JavaScript, WebAssembly, and Python on mobile devices, detailing its architecture, JIT compilation, optimizer passes, memory allocator, garbage collector, and performance benchmarks compared to LLVM and other runtimes.

Alibaba Terminal Technology
Alibaba Terminal Technology
Alibaba Terminal Technology
hyengine: Unifying Mobile Script Execution with JIT and Multi‑Language Support

Background

Mobile Taobao has used many script engines (JavaScriptCore, Duktape, V8, QuickJS) and languages (js, python, wasm, lua). To reduce package size and improve performance, the team started exploring a unified engine called hyengine that can execute multiple languages with a small footprint.

Design Overview

hyengine aims for lightweight, high‑performance, multi‑language support. It currently uses wasm3 for WebAssembly and QuickJS for JavaScript, achieving 2‑3× speed‑up for wasm/js execution while keeping the binary size minimal. The engine only supports arm64 JIT on Android.

Implementation

hyengine consists of two major parts: a compiler and a VM . The compiler is split into frontend, middle‑end, and backend. The backend implements JIT compilation for wasm and QuickJS, reusing existing engines where possible.

Assembler

// Name: ADC
// Arch: 32-bit variant
static inline void ADC_W_W_W(uint32_t *buffer, int8_t rd, int8_t rn, int8_t rm) {
    uint32_t code = 0b00011010000000000000000000000000;
    code |= IMM5(rm) << 16;
    code |= IMM5(rn) << 5;
    code |= IMM5(rd);
    *buffer = code;
}

Disassembler

#define IS_MOV_X_X(ins) \
    (IMM11(ins >> 21) == IMM11(HY_INS_TEMPLATE_MOV_X_X >> 21) && \
    IMM11(ins >> 5) == IMM11(HY_INS_TEMPLATE_MOV_X_X >> 5))

Wasm Compilation

The JIT walks the wasm module, estimates memory, and translates each opcode to ARM64 instructions. Example for i32.add:

case OP_I32_ADD: {
    LDR_X_X_I(alloc + codeOffset++, R8, R19, (spOffset - 2) * sizeof(void *));
    LDR_X_X_I(alloc + codeOffset++, R9, R19, (spOffset - 1) * sizeof(void *));
    ADD_W_W_W(alloc + codeOffset++, R9, R8, R9);
    STR_X_X_I(alloc + codeOffset++, R9, R19, (spOffset - 2) * sizeof(void *));
    spOffset--;
    break;
}

Optimizer

The optimizer splits a method into basic blocks, runs a series of passes, and then merges the blocks back. Key passes include block‑level simplifications, register allocation, and feature matching.

Register Allocation

Hyengine uses a simple frequency‑based linear‑scan allocator: the most frequently accessed stack offsets are assigned to registers, reducing memory loads/stores.

Register Parameter Passing

0x1057e405c: add    x0, x19, #0x10
0x1057e4060: mov    x1, x22
0x1057e4064: bl     0x1057e4064
0x1057e4068: mov    x22, x0

Feature Matching

0x104934038: sub    w22, w20, #0x2

Optimization Results

After applying the passes, code size dropped from 63 to 32 instructions and execution time for the fibonacci benchmark fell from 1716 ms to 493 ms (≈1.6× faster than LLVM).

QuickJS Compilation

QuickJS JIT generates many instructions per opcode. Example for OP_object:

// *sp++ = JS_NewObject(ctx);
case OP_object: {
    MOV_FUNCTION_ADDRESS_TO_REG(R8, JS_NewObject);
    MOV_X_X(NEXT_INSTRUCTION, R0, CTX_REG);
    BLR_X(NEXT_INSTRUCTION, R8);
    STR_X_X_I(NEXT_INSTRUCTION, R0, R26, SP_OFFSET(0));
    CHECK_EXCEPTION(R0, R9);
    break;
}

Memory Allocator (hymalloc)

hymalloc divides memory into 19 regions (18 small, 1 large). Each region contains pools of fixed‑size items. Allocation grabs the head of a free‑item list; if empty, a new pool is requested from the system.

static void* _HYMallocFixedSize(HYMRegion *region, size_t size) {
    if (region->free_item_list == NULL) {
        size_t item_size = region->item_size ? region->item_size : size;
        int ret = _HYMAllocPool(region, region->pool_initial_item_count, item_size);
        if (!ret) return NULL;
    }
    HYMItem *item = region->free_item_list;
    region->free_item_list = item->next;
    item->region = region;
    item->flags = 0;
    return &item->ptr;
}

Garbage Collector (hygc)

hygc replaces QuickJS’s reference‑count + mark‑sweep with a multi‑threaded tri‑color mark‑sweep collector. The GC thread marks reachable objects while the JS thread continues execution; the JS thread later frees objects after the GC thread finishes.

#define CHECK_EXCEPTION(reg, tmp) \
    MOV_X_I(NEXT_INSTRUCTION, tmp, ((uint64_t)JS_TAG_EXCEPTION<<56)); \
    CMP_X_X_S_I(NEXT_INSTRUCTION, reg, tmp, LSL, 0); \
    B_C_L(NEXT_INSTRUCTION, NE, 4 * sizeof(uint32_t)); \
    EXCEPTION(tmp)

Performance Results

Benchmarks on an M1 Mac and Huawei Mate 8 show significant improvements for both wasm and JavaScript workloads after the optimizations. Images illustrate the performance curves.

Wasm performance chart
Wasm performance chart
JS performance chart
JS performance chart

Future Plans

Continue performance tuning, add a custom bytecode backend, expand optimizer passes, implement hotspot method assembly, and explore multi‑language support beyond JS, wasm, and Python.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MobileoptimizationcompilerGarbage CollectionJITmemory allocation
Alibaba Terminal Technology
Written by

Alibaba Terminal Technology

Official public account of Alibaba Terminal

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.