Big Data 23 min read

How Alipay Cut Merchant Bill Complexity by 60% Using a Five‑Step Method

This article details how Alipay's data engineering team applied Elon Musk's five‑step work method to completely refactor a decade‑old merchant billing system, reducing overall complexity by over 60%, improving timeliness by an hour, cutting storage and compute costs by a third, and dramatically lowering operational and maintenance burdens.

Alipay Experience Technology
Alipay Experience Technology
Alipay Experience Technology
How Alipay Cut Merchant Bill Complexity by 60% Using a Five‑Step Method
Editor’s note: The author, a data R&D engineer at Ant Group, applied Musk’s five‑step method to the Alipay merchant billing project, aiming to help anyone looking to reduce system complexity.

0. Overview

The Alipay China data team used Musk’s five‑step method over the past year to refactor a 10‑year‑old merchant billing system, cutting overall complexity by 60%, improving timeliness by one hour, reducing storage cost by 30%, and dramatically lowering understanding and maintenance costs. Complexity drives many problems, increasing operational costs and reducing business efficiency. The experience shows that much of the complexity is unnecessary, and we should simplify rather than practice defensive programming.

1. Refactoring Background

1.1 What Is a Merchant Bill

Merchants generate business through Alipay, and we provide them with transaction statements or vouchers, which are called merchant bills . Merchants can download these bills from the B‑station and compare them with their own business records and fund movements to verify that all transactions and funds were processed correctly; this process is called merchant reconciliation .

Alipay offers many bill types—cash flow, transaction orders, asset vouchers, marketing entries, expense bills, and customized bills. Implementation includes real‑time online bills and offline ODPS‑based daily/monthly bills. Online bills are for business queries, while offline bills are mainly for merchant reconciliation. This article focuses on the offline bills .

1.2 Why Refactor

In a nutshell: 10 years of accumulated complexity .

The merchant bill is a core B‑to‑B product serving merchants. It faces the classic contradiction of “personalized demands for millions of merchants versus cost‑controlled rapid support.” In practice, teams often avoid touching existing logic for stability reasons and instead add new fields, leading to hundreds of fields after a decade, with many convoluted processing chains. The current bill has thousands of tasks, nearly ten thousand dependencies, an average processing depth of over 20 layers, and tangled cross‑domain couplings—essentially a tangled mess.

1.3 Why Now

The logic is overly complex, making the cost of ensuring bill accuracy and timeliness prohibitively high.

Offline bills are used for reconciliation, akin to millions of merchants searching for problems with a magnifying glass. Every field—amounts, timestamps, order IDs, store IDs—must be flawless. The current complexity leads to frequent missed changes and downstream impacts, generating hundreds of queries per year from second‑line developers and tens of thousands of issues for frontline teams.

Timeliness pressure is even greater. To meet a competitive T+1 10 am deadline, Alipay promises T+1 9 am billing. After subtracting online generation and exception handling time, offline bills must be produced by T+1 5:30 am. From Sep to Dec 2023, operations handled over 150 night‑time alerts across 67 days, with a night‑shift ratio of 54.9 % due to long processing chains and insufficient baseline buffers.

Therefore, we decided to reconstruct the merchant bill to lower complexity, improve user experience, and reduce operational costs.

2. Refactoring Goals

Reduce complexity by 50 % to achieve five business outcomes:

Accuracy : Each field’s meaning is clear and data is internally consistent.

High Timeliness : Bill generation advanced by 1 hour Good Operations : Critical issues can be re‑run within 12 hours (down from 72 hours); routine exceptions handled within 1 hour; code is modular and easy to understand.

Easy Extensibility : Strong scalability, fast response to new business needs without major code changes; gray‑scale full‑link regression reduces change risk.

Low Cost : Reduce storage and compute costs by roughly one‑third while meeting re‑run requirements.

3. Applying the Five‑Step Method to Refactor the Bill

Elon Musk’s success at Tesla and SpaceX shows that the five‑step method can simplify complex problems and cut costs. The steps are:

Question – challenge unnecessary requirements.

Reduce – simplify processes and components.

Optimize – improve based on the first two steps.

Accelerate – speed up iteration.

Replace – automate the final hand‑over.

3.1 Question

We asked why there were so many fields, why each field had so much logic, and why the processing chain was so long. Two main activities followed:

Identify which fields are actually used, by how many merchants, and which merchants use them.

Starting from the downstream table fields, trace the processing chain upward to see the original source domains.

For the cash‑flow bill, out of hundreds of fields, less than one‑third are core; about half are personalized (used by fewer than 100 merchants); the rest are unused. Sources cluster in accounting, transaction, payment, fee, settlement, and recharge domains.

Insights:

We don’t need that many fields; focus on core fields first.

Process information by domain and then combine for centralized processing.

3.2 Reduce

Guided by the questioning insights, we redesigned the architecture (see Figure 4).

Key actions:

Decompose each final bill field into domain‑specific fields (e.g., merchant order number becomes a transaction‑domain field or an accounting‑domain fallback).

Build intermediate layers per domain, preprocessing needed fields early.

Combine domain results into a wide factor table, then clean and produce final bill fields.

Generate daily detail bills; aggregates (daily, monthly) are derived from them.

At this stage, we do not over‑optimize details because the later “Replace” step will bring back any missing logic.

3.3 Optimize

We tackled several hard problems:

Billing Scope : Clarified that only merchants who have signed specific products receive bills, reducing tasks from dozens to under ten.

Cross‑Day Jar Package : Previously, associating a yesterday’s transaction with multi‑day data required heavy resources. We introduced a jar‑based solution that splits a large task into many sub‑tasks on ODPS.

Offline‑Online Fusion : Use online UDFs to call HTTP interfaces for fast look‑ups; fallback to offline cross‑day association for failures, greatly reducing compute cost and improving timeliness.

Offline Gray‑Scale : Run tasks once to produce both original and gray‑scale values, then a control module decides which merchants use which version.

Stability Self‑Healing : Implement automatic re‑run for recoverable errors and a “slow‑task” auto‑copy‑and‑kill mechanism for performance degradation.

3.4 Accelerate

After reducing core fields, we needed to quickly fill in the remaining personalized fields. We introduced a “pull‑through” metric—merchant flow‑through rate—and iterated rapidly. The metric is:

Merchant Flow‑Through Rate = (Number of merchants whose all fields pass verification / Total merchants) * 100%

For cash‑flow bills, the rate rose from 41 % in the first version to 85 % after about ten iterations, and now sits at 99.3 % after fine‑tuning.

3.5 Replace

When the flow‑through rate exceeded 95 % and timeliness issues were resolved, we began replacing the old bill with the new one, prioritizing merchants who frequently download and verify bills to gather feedback while avoiding long‑tail edge cases.

4. Refactoring Effects

The outcomes met expectations:

Complexity : Task count for cash‑flow bills dropped by over 60 %; transaction orders dropped by 47 %.

Timeliness : Cash‑flow bill generation improved by 1.5 hours; transaction orders improved by 1 hour.

Cost : Storage and compute costs fell by roughly one‑third, saving millions of RMB annually.

Accuracy : Daily detail bills now feed summary, monthly, and historical bills, eliminating internal inconsistencies.

Operations/Understanding Cost : Code coupling decreased dramatically; the time to understand the bill system fell from 6–12 months to about one month.

5. Summary & Reflection

Key takeaways:

Complexity breeds problems —high complexity inflates operational cost and hampers efficiency.

Much complexity is unnecessary —legacy logic often becomes obsolete; simplifying yields large gains.

Decomposition matters, but rhythm matters more —knowing what to tackle first and maintaining the right pace is crucial for progress.

Refactoring itself should be cheaper —future work should aim to productize the refactoring process to lower its cost.

Two career paths for data‑warehouse engineers :

Specialist data‑technology expert focusing on code and architecture optimization.

Global data architect bridging data producers and consumers to reduce friction and unlock data’s multiplier effect.

data engineeringbig dataAutomationoperationssystem refactoringcost reduction
Alipay Experience Technology
Written by

Alipay Experience Technology

Exploring ultimate user experience and best engineering practices

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.