Designing a Scalable Asynchronous Batch Processing Framework for Massive User Data
This article presents a detailed design and implementation of an asynchronous, parallel batch processing framework that tackles massive user data handling challenges by sharding databases, leveraging async HLR commands, configurable thread pools, and JMX monitoring to achieve high performance and reliability.
Background
With the rapid growth of internet information technology, the era of massive data explosion has arrived. Telecom operators must process and analyze huge volumes of user data quickly, otherwise they risk being eliminated by the market.
Problem Statement
The author’s company stores millions of mobile users across provinces in a single Oracle relational table. During the beginning of each month, data volume spikes dramatically, causing a traditional C++ multi‑process backend to become a bottleneck. The system experiences delayed HLR (Home Location Register) suspend/resume commands, leading to customer complaints and potential churn.
Proposed Optimization
Horizontal sharding of the user table by city to reduce I/O pressure on any single database server.
Convert synchronous HLR command dispatch to an asynchronous callback model, allowing the processing module to continue without blocking.
Parallel loading of sharded data using a custom BatchQueryLoader that leverages Guava ListenableFuture for concurrent queries.
Configurable batch processing thread pool whose parameters (corePoolSize, maxPoolSize, workQueueSize, keepAliveTime) can be tuned for peak‑month load and scaled down during normal periods.
The overall component diagram is shown below:
Key Modules
BatchQueryLoader – Accepts multiple data source objects and uses Guava’s ListenableFuture to load data from each source in parallel, returning a combined result set.
BatchTaskReactor – Wraps ThreadPoolExecutor with a bounded ArrayBlockingQueue to dispatch the loaded data to asynchronous HLR tasks. It receives execution feedback via Future objects.
BatchTaskConfigurationLoader – Reads batchtask-configuration.xml to obtain thread‑pool parameters such as corePoolSize, maxPoolSize, workQueueSize, and keepAliveTime. The XML content defines the relationship between these values.
Thread‑Pool Policies – The framework supports the four standard ThreadPoolExecutor rejection policies: AbortPolicy, CallerRunsPolicy, DiscardPolicy, and DiscardOldestPolicy. Proper configuration of corePoolSize, maxPoolSize, and workQueueSize ensures the pool behaves predictably under load.
HlrBusinessEvent – Implements an interface for sending HLR suspend/resume commands. A proxy HlrBusinessEventAdvisor measures command latency.
NotifyUsersBatchTask – Orchestrates the whole flow: parallel data loading, asynchronous task submission, and result aggregation. It also records success and failure counts via NotifyTaskSuccCounter and NotifyTaskFailCounter.
Monitoring
Using Java Management Extensions (JMX), the BatchTaskMonitor MBean exposes methods such as getBatchTaskCounter(String name) to retrieve success or failure counters in real time. The author demonstrates connecting with JConsole and querying the TASKFAILCOUNTER, which reported 196 failed tasks during a test run.
Example Usage
The client code creates a NotifyUsersBatchTask instance, supplies database connection pools for the sharded notify_users tables, and triggers the batch process. After inserting 80 test records (e.g., for cities Fuzhou and Xiamen), the framework processes them, and the author shows screenshots of the execution results and JMX monitoring UI.
Conclusion
The presented asynchronous parallel batch framework, though concise, covers all essential components for high‑throughput, I/O‑intensive workloads. By exploiting multi‑core servers, horizontal sharding, configurable thread pools, and JMX monitoring, similar systems can achieve significant performance gains and better resource utilization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
