Design and Evaluation of Multiple Backend Excel Import Solutions for B2BTC Local Phase II
This article presents a detailed case study of four backend design alternatives for importing Excel data—ranging from synchronous processing to asynchronous pipelines with thread pools, message queues, and Redis locks—analyzing their advantages, drawbacks, and suitable scenarios.
1.1 Case Requirement and Background
B2BTC Local Phase II needs an Excel import function that processes fewer than one thousand rows per batch with low frequency, but each row must satisfy a composite unique constraint, requiring validation against other business RPC services and handling possible cache misses.
1.2 Overall Solution
The following four solutions were considered sequentially, each with its own trade‑offs.
1.2.1 Initial Idea (Synchronous Import)
The first thought was a typical synchronous import where a single thread in one container parses, validates, imports the Excel data, and sends notification messages.
Problems:
Large data volumes put significant memory pressure on a single server.
The process is long; each row triggers multiple RPC calls, leading to high TP99 latency and poor user experience.
Advantages:
The front end can synchronously obtain the import result.
1.2.2 Solution Two (Improved Version)
Based on the first approach, a thread pool was introduced for validation and a message queue (MQ) was used to process each record asynchronously, improving efficiency.
However, after implementation it was found that the design did not fit the actual scenario and posed a risk of TP99 timeouts.
Problems:
The business can fully operate asynchronously; all import results are delivered via messages.
Advantages:
The front end can still synchronously obtain validation results.
Thread pool and asynchronous handling increase processing efficiency.
Applicable Scenario: When the front end needs synchronous import results but the backend does not need to enforce uniqueness (e.g., unique primary keys are already present), allowing batch insertion without MQ.
1.2.3 Solution Three (Final Version)
Since the business does not require the front end to obtain import or validation results synchronously, the entire pipeline—Excel parsing, validation, and message handling—is made asynchronous using JMQ to dispatch messages to consumers.
Data is split to a per‑row granularity, validated and processed by a thread pool, with the main thread aggregating results. Uniqueness checks and data modifications are protected by fine‑grained Redis locks.
Problems:
All parsing, validation, and message handling run on a single server, increasing its load.
Advantages:
Thread‑pool processing greatly reduces message handling time and eases server pressure.
Fallback strategies ensure no data loss and timely completion.
All non‑essential steps are asynchronous, keeping TP99 latency low.
Applicable Scenarios:
Business requiring strong data integrity.
Workloads with moderate data volume to avoid excessive pressure on a single container.
1.2.4 Solution Four (Ideal Version)
To address the load issue of Solution Three, an ideal scenario is envisioned for large data volumes, lower reliability requirements, and high timeliness. The design reduces reconciliation and fallback mechanisms, still processes everything asynchronously, and uses JMQ to send messages to consumers that record progress in Redis.
Uniqueness checks, data modifications, and final Redis updates are protected by minimal‑impact Redis locks.
Problems:
Absence of a fallback strategy may cause message loss or unexpected termination, preventing final result aggregation.
Advantages:
All non‑essential steps are asynchronous, keeping TP99 latency low.
Splitting data and using multiple consumers dramatically shortens processing time.
JMQ retry mechanisms improve reliability.
Applicable Scenarios:
Front end does not need synchronous import results.
Business tolerates occasional data loss with retry mechanisms.
Some demand for import result visibility.
Large or frequently operated data sets.
2. Continuous Reflection
2.1 Reasonable Use of Middleware
Properly leveraging JMQ to decouple and split business logic reduces memory/CPU pressure on a single server, increases concurrency, and uses MQ retry mechanisms to improve availability.
Asynchronous processing may lose results in low‑reliability scenarios; in high‑reliability cases, reconciliation and fallback mechanisms are needed (e.g., comparing parsed data with final imported data and alerting on mismatches).
Over‑reliance on middleware can make services dependent on its reliability, making troubleshooting harder; a balanced approach is required.
2.2 Adequate Business Scenario Matching
Technical designs should be tailored to specific business contexts—data volume, reliability needs, etc.—instead of blindly copying existing solutions.
Solution Two, for example, borrowed ideas without proper fit, leading to new issues.
There is no universally best technical solution; the optimal one fits the current business scenario.
Scan the QR code to join the technical discussion group
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.