hyengine: Unifying Mobile Script Execution with JIT and Multi‑Language Support
This article introduces hyengine, a lightweight, high‑performance engine designed to run multiple scripting languages such as JavaScript, WebAssembly, and Python on mobile devices, detailing its architecture, JIT compilation, optimizer passes, memory allocator, garbage collector, and performance benchmarks compared to LLVM and other runtimes.
Background
Mobile Taobao has used many script engines (JavaScriptCore, Duktape, V8, QuickJS) and languages (js, python, wasm, lua). To reduce package size and improve performance, the team started exploring a unified engine called hyengine that can execute multiple languages with a small footprint.
Design Overview
hyengine aims for lightweight, high‑performance, multi‑language support. It currently uses wasm3 for WebAssembly and QuickJS for JavaScript, achieving 2‑3× speed‑up for wasm/js execution while keeping the binary size minimal. The engine only supports arm64 JIT on Android.
Implementation
hyengine consists of two major parts: a compiler and a VM . The compiler is split into frontend, middle‑end, and backend. The backend implements JIT compilation for wasm and QuickJS, reusing existing engines where possible.
Assembler
// Name: ADC
// Arch: 32-bit variant
static inline void ADC_W_W_W(uint32_t *buffer, int8_t rd, int8_t rn, int8_t rm) {
uint32_t code = 0b00011010000000000000000000000000;
code |= IMM5(rm) << 16;
code |= IMM5(rn) << 5;
code |= IMM5(rd);
*buffer = code;
}Disassembler
#define IS_MOV_X_X(ins) \
(IMM11(ins >> 21) == IMM11(HY_INS_TEMPLATE_MOV_X_X >> 21) && \
IMM11(ins >> 5) == IMM11(HY_INS_TEMPLATE_MOV_X_X >> 5))Wasm Compilation
The JIT walks the wasm module, estimates memory, and translates each opcode to ARM64 instructions. Example for i32.add:
case OP_I32_ADD: {
LDR_X_X_I(alloc + codeOffset++, R8, R19, (spOffset - 2) * sizeof(void *));
LDR_X_X_I(alloc + codeOffset++, R9, R19, (spOffset - 1) * sizeof(void *));
ADD_W_W_W(alloc + codeOffset++, R9, R8, R9);
STR_X_X_I(alloc + codeOffset++, R9, R19, (spOffset - 2) * sizeof(void *));
spOffset--;
break;
}Optimizer
The optimizer splits a method into basic blocks, runs a series of passes, and then merges the blocks back. Key passes include block‑level simplifications, register allocation, and feature matching.
Register Allocation
Hyengine uses a simple frequency‑based linear‑scan allocator: the most frequently accessed stack offsets are assigned to registers, reducing memory loads/stores.
Register Parameter Passing
0x1057e405c: add x0, x19, #0x10
0x1057e4060: mov x1, x22
0x1057e4064: bl 0x1057e4064
0x1057e4068: mov x22, x0Feature Matching
0x104934038: sub w22, w20, #0x2Optimization Results
After applying the passes, code size dropped from 63 to 32 instructions and execution time for the fibonacci benchmark fell from 1716 ms to 493 ms (≈1.6× faster than LLVM).
QuickJS Compilation
QuickJS JIT generates many instructions per opcode. Example for OP_object:
// *sp++ = JS_NewObject(ctx);
case OP_object: {
MOV_FUNCTION_ADDRESS_TO_REG(R8, JS_NewObject);
MOV_X_X(NEXT_INSTRUCTION, R0, CTX_REG);
BLR_X(NEXT_INSTRUCTION, R8);
STR_X_X_I(NEXT_INSTRUCTION, R0, R26, SP_OFFSET(0));
CHECK_EXCEPTION(R0, R9);
break;
}Memory Allocator (hymalloc)
hymalloc divides memory into 19 regions (18 small, 1 large). Each region contains pools of fixed‑size items. Allocation grabs the head of a free‑item list; if empty, a new pool is requested from the system.
static void* _HYMallocFixedSize(HYMRegion *region, size_t size) {
if (region->free_item_list == NULL) {
size_t item_size = region->item_size ? region->item_size : size;
int ret = _HYMAllocPool(region, region->pool_initial_item_count, item_size);
if (!ret) return NULL;
}
HYMItem *item = region->free_item_list;
region->free_item_list = item->next;
item->region = region;
item->flags = 0;
return &item->ptr;
}Garbage Collector (hygc)
hygc replaces QuickJS’s reference‑count + mark‑sweep with a multi‑threaded tri‑color mark‑sweep collector. The GC thread marks reachable objects while the JS thread continues execution; the JS thread later frees objects after the GC thread finishes.
#define CHECK_EXCEPTION(reg, tmp) \
MOV_X_I(NEXT_INSTRUCTION, tmp, ((uint64_t)JS_TAG_EXCEPTION<<56)); \
CMP_X_X_S_I(NEXT_INSTRUCTION, reg, tmp, LSL, 0); \
B_C_L(NEXT_INSTRUCTION, NE, 4 * sizeof(uint32_t)); \
EXCEPTION(tmp)Performance Results
Benchmarks on an M1 Mac and Huawei Mate 8 show significant improvements for both wasm and JavaScript workloads after the optimizations. Images illustrate the performance curves.
Future Plans
Continue performance tuning, add a custom bytecode backend, expand optimizer passes, implement hotspot method assembly, and explore multi‑language support beyond JS, wasm, and Python.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
