Artificial Intelligence 16 min read

Taobao AI Virtual Try-On: Offline Data Processing and Performance Optimization

Taobao’s AI virtual‑try‑on system pre‑computes fitting results offline, writes them into the Item Center via scalable ScheduleX tasks, optimizes pagination, locking and flow‑control, and thereby processes millions of apparel items in under thirty minutes with 99.9% success and reliable checkpoint‑resume monitoring.

DaTaobao Tech

Oct 18, 2024

Taobao AI Virtual Try-On: Offline Data Processing and Performance Optimization

With the rapid growth of e‑commerce, users expect more intuitive shopping experiences, especially for apparel. Taobao’s virtual try‑on project uses AI to provide personalized fitting, aiming to improve conversion by efficiently writing try‑on material into the IC (Item Center) extension structure via scheduled tasks.

Background

Clothing items are non‑standard; users cannot estimate fit from model images alone, leading to low conversion for items lacking complete data.

What Taobao Try‑On Has Achieved

Expanded coverage to dresses, tops, etc.

Supported multiple SKUs per product with length guides.

Improved model realism and added user‑photo try‑on.

Enhanced visual quality of try‑on results.

Cooperation Scenarios

LAZADA: replace missing model images with Southeast‑Asian models.

Taobao detail page, cart: add try‑on badge for users.

BC chat: real‑time try‑on entry.

Challenges in Detail Page

Immersive try‑on requires page navigation, breaking the purchase flow.

Recommended items and wardrobe are unsuitable for detail pages.

Real‑time try‑on adds latency and stresses GPU resources.

To address these, the team added an AI try‑on anchor directly on the main image, pre‑computed try‑on results offline, and wrote them into IC.

Offline Task: Writing Try‑On Material to IC

Implementation:

@Override
public ProcessResult process(final JobContext jobContext) throws Exception {
    // handle master task
    if (isRootTask(jobContext)) {
        return processRootTask(jobContext);
    }
    // handle sub‑task
    if (StringUtils.equals(jobContext.getTaskName(), SUB_TASK_NAME)) {
        return processDressOfflineDataWritingIcTask(jobContext);
    }
    return new ProcessResult(true);
}

@Override
public ProcessResult reduce(final JobContext jobContext) throws Exception {}

Data preprocessing uses ODPS + ScheduleX grid tasks. The preprocessing aggregates multiple images per item into a single JSON field extend_info to reduce downstream QPS.

Example of assembling task context:

/********
 * Assemble offline task context
 *
 * @param jobContext task basic info
 * @param context    offline task context
 ********/
public void assembleContextParam(final JobContext jobContext, final DressWritingIcTaskContext context) {
    final JSONObject params;
    try {
        params = JSON.parseObject(jobContext.getInstanceParameters());
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
    context.setUpdateType(TaskUpdateTypeEnum.parse(params.getString(UPDATE_TYPE)));
    context.setTableName(params.getString(ODPS_TABLE));
    context.setProjectName(params.getString(ODPS_PROJECT));
    context.setPartition(params.getString(PARTITION));
    context.setTaskId(jobContext.getTaskId());
    context.setJobInstanceId(jobContext.getJobInstanceId());
}

Sub‑task processing logic:

@Override
public ProcessResult process(final JobContext jobContext) throws Exception {
    if (isRootTask(jobContext)) {
        return processRootTask(jobContext);
    }
    if (StringUtils.equals(jobContext.getTaskName(), SUB_TASK_NAME)) {
        return processDressOfflineDataWritingIcTask(jobContext);
    }
    return new ProcessResult(true);
}

Core sub‑task method:

/**
 * Sub‑task main flow
 */
private ProcessResult processdressWritingIcTask(final JobContext jobContext) {
    // 1. get sub‑task context
    final DressWritingIcTaskContext dataWritingIcTask = (DressWritingIcTaskContext)jobContext.getTask();
    // 2. process records by page
    final TaskUpdateResult taskUpdateResult = processRecordsByPage(dataWritingIcTask);
    // 3. return result
    return new ProcessResult(true, JSONObject.toJSONString(taskUpdateResult));
}

Result aggregation and notification:

/**
 * Aggregate results and send DingTalk/robot notifications
 */
@Override
public ProcessResult reduce(final JobContext jobContext) {
    final TaskUpdateResult processResult = dressWritingIcProcessManager.getSuccessCountFromProcessResult(jobContext);
    // assemble data ...
    // send DingTalk notification
}

Counting successful sub‑tasks:

public TaskUpdateResult getSuccessCountFromProcessResult(JobContext jobContext) {
    TaskUpdateResult taskUpdateResult = new TaskUpdateResult();
    for (String value : jobContext.getTaskResults().values()) {
        if (StringUtils.isNotBlank(value)) {
            try {
                // integrate needed data from sub‑task results
            } catch (Exception e) {
                LoggerUtil.error(logger, e, "Parse taskUpdateResult failed,value:", value);
            }
        }
    }
    return taskUpdateResult;
}

Performance Goals

Horizontal scalability: achieve million‑item processing within an hour.

99.9% success rate for labeling.

Breakpoint resume and visualized progress.

Optimization points:

Reduce lock scope and add retry on lock failure.

Dynamic pagination based on total data size and machine count.

Thread‑pool pagination with even distribution.

Replace manual sleep‑based rate limiting with Sentinel’s built‑in flow control.

Sample pagination logic comment:

此此前得知，单线程时请求更新耗平均约为40ms，那么单机单线程qps约为25qps，ic商品中心规定qps千级别以下请求不需要考虑限流问题，因为我们规定单机限流100，线上机器共10台，虽然理论值1000qps，实则可能小于1000qps；
进行子任务分发时，单机限流100，则规定单机4线程并行处理数据，以一分钟单机处理最大数据量为界限：60 * 1000ms /30 * 4  = 8000 条数据，则switch配置的分页阈值为8000；
上述提到的40ms是大部分请求下的执行耗时，30ms是较快请求下的执行耗时。

Result after optimization:

Processed millions of records in half an hour with 99.9% success.

Breakpoint resume prevents duplicate updates.

Monitoring via ScheduleX console, DingTalk alerts, and ODPS trace logs.

Future Outlook

The team plans to further improve model realism, main‑image composition, and explore more immersive, multi‑dimensional try‑on experiences.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance optimization Big Data AI task scheduling offline processing virtual try-on

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.