How Youzan Scaled Order Export to Millions with ES, HBase, and Config‑Driven Design

This article examines the challenges of Youzan's order export system, describes the migration from PHP‑based scripts to an Elasticsearch and HBase stack, and details the step‑by‑step configuration‑driven refactor—including enum field definitions, Groovy scripts, strategy patterns, plugin architecture, and quality‑assurance practices—that enabled million‑order exports with high performance and stability.

Youzan Coder
Youzan Coder
Youzan Coder
How Youzan Scaled Order Export to Millions with ES, HBase, and Config‑Driven Design

Background and Challenges

Youzan's order export service, part of the transaction order management team, originally generated reports by pulling data from multiple databases and APIs using PHP scripts. As the platform grew across industries (mall, retail, catering, beauty, education) and modules (transaction, assets, customers, marketing, stores), the number of export fields exceeded 100, creating a need for a flexible, scalable solution.

Initial Architecture Limitations

High CPU usage and blocking when many large‑scale export jobs ran concurrently.

Direct queries to business databases caused slow queries and risked impacting core business performance.

Refactor to Elasticsearch + HBase

The team migrated the export pipeline to an Elasticsearch‑HBase stack. Order search now uses Elasticsearch, while detailed order data resides in HBase and is accessed via APIs. This change delivered:

Support for million‑level order exports, with typical throughput of 10,000 orders per minute.

Elastic scalability of ES and HBase, keeping performance stable despite order volume growth.

Elimination of direct business‑DB access, removing the risk of blocking core services.

Configurable Field Definitions

To avoid code changes for each new report field, the export logic was refactored to use an enum‑based, configurable field definition. The pseudo‑code for generating a report line is:

public List<String> generateReportLineData(List<String> fields) {
    return StreamUtil.map(fields, field -> {
        try {
            FieldDefinition fieldDef = getFieldDefinition(field);
            FieldMethod method = getMethod(fieldDef);
            String value = method.invoke(this.reportItem);
            return postproc(value);
        } catch (Exception e) {
            logger.warn("failed to get value for field: {} orderNo: {}", field, reportItem.getOrderNo());
            return "";
        }
    });
}

This approach allows new fields to be added by simply defining them in configuration, without touching the processing code.

Report Configuration and Strategy Patterns

The system stores export templates in export_biz_conf (industry/product level) and export_customized_conf (merchant level). Each template includes field lists, dimensions (order vs. item), file format, and options. Strategy patterns are used to select different implementations based on order volume, field source (code vs. Groovy), aggregation level, and output format.

Dynamic Groovy Scripts for Field Logic

To enable runtime addition of fields, Groovy scripts are stored in export_field_conf and referenced from the template tables. A sample Groovy script for extracting a fan's name is shown below:

Groovy script example for fan name
Groovy script example for fan name

The utility class PublicUtil simplifies script writing, and compiled scripts are cached to avoid memory leaks.

Plugin‑Based General Export Framework

To support diverse export scenarios (e.g., distribution purchase orders), the core export flow was abstracted into a plugin architecture:

Define a plugin interface covering configuration and functionality.

Implement plugins for data retrieval (ES, HBase, API), filtering, sorting, formatting, and report generation.

Compose plugins into concrete export instances using the Template Method pattern.

For a distribution purchase order, the pipeline executes ES query → order detail plugins (buyer & supplier) → sorting → formatting → report generation.

Quality Assurance Practices

Ensuring data correctness after extensive refactoring relies on:

Strict unit‑test coverage with all tests passing.

Code reviews by both the component owner and senior engineers.

Pre‑release automated comparison tools that validate export results against production data.

Continuous small‑scale refactors to avoid large, risky changes.

The codebase now totals ~18K lines with a duplication rate of ~1.8%.

Conclusion

By moving to an ES + HBase architecture, introducing configuration‑driven field definitions, leveraging Groovy for dynamic logic, and building a plugin‑based export framework, Youzan achieved high‑performance, scalable, and maintainable order export capable of handling millions of orders per minute while preserving data integrity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationBackend ArchitectureElasticsearchConfigurationHBaseGroovyorder export
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.