Backend Development 10 min read

Designing Scalable Excel Import Pipelines: Four Architecture Patterns Compared

This article examines four architectural approaches for importing Excel data—synchronous, thread‑pooled with MQ, fully asynchronous with Redis locks, and an ideal high‑throughput variant—detailing their trade‑offs, performance impacts, and suitable business scenarios in modern backend systems.

JD Cloud Developers

Nov 20, 2024

Designing Scalable Excel Import Pipelines: Four Architecture Patterns Compared

1. Case Information and Design

1.1 Requirement and Background

B2BTC Phase‑2 requires an Excel import feature handling fewer than 1,000 rows per batch with low frequency. Each record must satisfy a composite unique constraint, requiring validation against multiple fields. The process involves several RPC calls for data verification and enrichment; cache misses can further degrade timeliness.

1.2 Overall Solution

Four design alternatives were explored, each with distinct trade‑offs.

1.2.1 Initial Synchronous Import

The first idea was a classic synchronous import: a single thread in one container parses the Excel file, validates each row, writes to the database, and sends a notification.

Problems:

High memory pressure on a single server when data volume grows.

Long processing time due to multiple RPC calls per row, leading to high TP99 latency and poor user experience.

Advantages:

Front‑end can obtain the import result synchronously.

1.2.2 Improved Version (Thread Pool + MQ)

Inspired by another Excel import case, this version adds a thread pool for validation and uses a message queue (MQ) to process each record asynchronously, improving efficiency over the pure synchronous approach.

Problems discovered after implementation:

The design did not fit the actual business scenario and still risked TP99 timeouts.

Advantages:

Front‑end can still receive validation results synchronously.

Thread pool and asynchronous processing increase throughput.

Applicable scenario: when the front‑end needs immediate validation feedback but the back‑end does not require unique‑key checks (e.g., batch insert without MQ).

1.2.3 Final Version (Fully Asynchronous with Redis Lock)

Since the business does not require the front‑end to obtain import or validation results synchronously, the entire pipeline—Excel parsing, validation, and message handling—is made asynchronous using JMQ. A thread pool validates data, while the main thread aggregates results. Redis locks protect uniqueness during DB operations.

Problems:

All steps run on a single server, creating notable resource pressure.

Advantages:

Thread‑pool processing dramatically reduces message‑handling time and eases server load.

A fallback strategy ensures no data loss and guarantees timely completion.

All non‑essential steps are asynchronous, keeping TP99 latency low and reliable.

Applicable scenario: workloads that require data integrity but have moderate data volume.

1.2.4 Ideal Version (High‑Throughput, Low Reliability)

Building on the third version, this design targets large data volumes and high timeliness where strict reliability is not critical. It splits import data into messages processed by multiple consumers via JMQ, tracks progress in Redis, and still uses Redis locks for uniqueness.

Problems:

No fallback mechanism; a lost or stuck message could prevent final result aggregation.

Advantages:

All steps are asynchronous, keeping TP99 latency low.

Dividing data into messages and using multiple consumers greatly shortens processing time.

JMQ’s retry mechanism improves reliability despite the relaxed guarantees.

Applicable scenario: front‑end does not need synchronous results, back‑end can handle data fully asynchronously, the business tolerates occasional data loss, and the workload involves large or frequent operations.

2. Ongoing Reflections

2.1 Reasonable Use of Middleware

Leveraging JMQ to decouple and split business logic reduces memory and CPU pressure on a single instance and raises concurrency. The MQ’s retry mechanism helps maintain availability. However, asynchronous processing can lose results; in scenarios with low reliability requirements this is acceptable, but high‑reliability cases need reconciliation and fallback mechanisms. Over‑reliance on middleware also increases service dependency and makes troubleshooting harder.

2.2 Aligning Solutions with Business Scenarios

Technical design should be driven by the specific business context—data volume, reliability needs, and operational constraints—rather than copying existing patterns blindly. The best solution is the one that best fits the current scenario, not an abstract “optimal” architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Redis Lock backend-architecture Excel Import asynchronous processing

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.