How Alipay’s AR ‘Air Writing’ Brings Fortune Characters to Life in Mini‑Games
This article details how Alipay’s 2022 ‘Air Writing’ AR feature was built for the Five‑Fortune project, covering the technical challenges, the front‑end stack—including Vite, Oasis Engine, ARSession, and WebGL shaders—implementation steps for drawing, vectorizing, extruding characters, playback handling, and memory‑performance optimizations.
Background
AR technology has become popular in games, medical, transportation, fashion, and education, but its use in marketing interaction is still limited. The Alipay Five‑Fortune project introduced an AR "air writing" feature to let users create and place fortune characters in real space.
Technical Constraints
Hardware limits: H5‑based interactive marketing relies on WebXR, which is immature; only ARKit/ARCore are available.
High development cost: requires coordination of front‑end, client, and algorithm teams.
Performance bottlenecks: AR and AI workloads heavily load CPU and GPU.
Technology Stack
Alipay Mini‑Game Container : provides rendering via a wrapped OpenGL API, exposing native capabilities through JSAPI. It does not implement a DOM, so GUI is drawn with WebGL.
ARSession : abstracts AR capabilities from ARCore/ARKit and device sensors.
Oasis Engine : a game engine that simplifies 3D development and bridges H5 APIs to the mini‑game container.
Development Process
1. Create Front‑End Scaffold
Vite is used for fast transpiling and bundling. A custom mini‑game scaffold supports TypeScript and NPM packages (see
create‑oasis‑app).
2. Adapt to Mini‑Game Container
The adapter normalizes API differences, allowing code such as
new Image()to work on both web and mini‑program environments.
3. Render Camera Background
ARSession supplies per‑frame camera data. A full‑screen plane is created and a YUV‑to‑RGB shader converts the camera texture to RGB for display.
const yuv_vs = `
attribute vec3 POSITION;
attribute vec2 TEXCOORD_0;
uniform mat4 u_uvMatrix;
varying vec2 v_uv;
void main() {
vec2 flipUV = TEXCOORD_0;
flipUV.y = 1.0 - flipUV.y;
v_uv = (u_uvMatrix * vec4(flipUV, 1.0, 1.0)).xy;
gl_Position = vec4( POSITION.xy, 1.0, 1.0);
}
`;
const yuv_fs_Android = `
uniform sampler2D u_frameY;
uniform sampler2D u_frameUV;
varying vec2 v_uv;
void main() {
float y = texture2D(u_frameY, v_uv).a;
vec4 uvColor = texture2D(u_frameUV, v_uv);
float u = uvColor.a - 0.5;
float v = uvColor.r - 0.5;
float r = y + 1.13983 * v;
float g = y - 0.39465 * u - 0.58060 * v;
float b = y + 2.03211 * u;
gl_FragColor = vec4(r, g, b, 1.0);
}
`;
const yuv_fs_iOS = `
uniform sampler2D u_frameY;
uniform sampler2D u_frameUV;
varying vec2 v_uv;
void main() {
float y = texture2D(u_frameY, v_uv).a;
vec4 uvColor = texture2D(u_frameUV, v_uv);
float u = uvColor.r - 0.5;
float v = uvColor.a - 0.5;
float r = y + 1.04 * v;
float g = y - 0.343 * u - 0.711 * v;
float b = y + 1.765 * u;
gl_FragColor = vec4(r, g, b, 1.0);
}
`;Each frame updates the
u_uvMatrix,
u_frameY, and
u_frameUVuniforms via the
onARFramecallback.
myARSession.onARFrame(arframe => {
updateBackgroundScene(arframe);
}); function updateBackgroundScene(arframe) {
const w = arframe.width;
const h = arframe.height;
const len = w * h;
if (len <= 0) return;
// Update u_uvMatrix
if (arframe.capturedImageMatrix) {
const matrix = bgMaterial.shaderData.getMatrix("u_uvMatrix");
matrix.setValueByArray(arframe.capturedImageMatrix);
}
// Update textures
if (arframe.capturedImage) {
const cameraFrame = arframe.capturedImage;
let textureFrameY = bgMaterial.shaderData.getTexture("u_frameY");
let textureFrameUV = bgMaterial.shaderData.getTexture("u_frameUV");
if (!textureFrameY) {
textureFrameY = new Texture2D(engine, w, h, TextureFormat.Alpha8, false);
textureFrameY.wrapModeU = textureFrameY.wrapModeV = TextureWrapMode.Clamp;
bgMaterial.shaderData.setTexture("u_frameY", textureFrameY);
}
if (!textureFrameUV) {
textureFrameUV = new Texture2D(engine, w / 2, h / 2, TextureFormat.LuminanceAlpha, false);
textureFrameUV.wrapModeU = textureFrameUV.wrapModeV = TextureWrapMode.Clamp;
bgMaterial.shaderData.setTexture("u_frameUV", textureFrameUV);
}
textureFrameY.setPixelBuffer(new Uint8Array(cameraFrame, 0, len));
textureFrameUV.setPixelBuffer(new Uint8Array(cameraFrame, len));
}
}4. Implement Air Writing
The workflow consists of two chains: writing and playback. The writing chain uses a 2D vectorized brush, extrudes the geometry, and adds stickers (glTF models) with a custom
DragComponenthandling pointer events.
Brush to Vector
We ported the
shodolibrary to the mini‑game container for brush rendering, then used
potrace(C++ version called via JSAPI) to convert bitmap strokes to vector paths.
Stickers
Stickers are glTF models loaded at runtime; each sticker receives a
DragComponentthat implements
onPointerDownand
onPointerDragfor interactive placement.
Transition Effects
After 2D drawing, the character transitions to 3D using an orthographic‑to‑perspective interpolation, keeping size consistent while applying a Lottie animation for visual flair.
Playback Implementation
Stroke data is recorded as a three‑level array (character → stroke → point) together with brush metadata and canvas size. Example structure:
{
char: [
[[x, y, timestamp], ...],
...
],
brush: { icon, extInfo },
canvas2D: { width, height }
}Sticker data is stored as an array of
[url, x, y]entries.
Data Thinning
Points closer than 10 pixels are removed to keep the payload under ~10 KB.
for (let i = 0; i < charData.length; i++) {
const originalStroke = charData[i];
let last = null;
charData[i] = [];
for (let j = 0; j < originalStroke.length; j++) {
const point = originalStroke[j];
if (last) {
const l = length(point[0], point[1], last[0], last[1]);
if (l < 10 && j !== originalStroke.length - 1) continue;
}
charData[i].push(point);
last = point;
}
}Memory & Performance Optimizations
Peak memory should stay below 200 MB. Optimizations include:
Reducing main canvas resolution based on device capability (scale factors 0.8, 0.6, 0.5).
Lowering AR camera resolution to match the canvas.
Optimizing request handling: bypass base64 conversion by creating
ArrayBufferdirectly in C++ and exposing it via JSBinding.
Canvas‑to‑GPU upload: use dirty‑flag checks to avoid unnecessary texture uploads.
Video recording resolution: limit to 720p or 540p depending on device.
Conclusion
The AR air‑writing feature attracted many users and demonstrated a complete pipeline from brush input to 3D rendering, playback, and efficient resource usage. Ongoing work will continue to refine the AR toolchain and explore new interactive experiences.
Alipay Experience Technology
Exploring ultimate user experience and best engineering practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.