Designing a Scalable Asynchronous Batch Processing Framework for Massive User Data

This article presents a detailed design and implementation of an asynchronous, parallel batch processing framework that tackles massive user data handling challenges by sharding databases, leveraging async HLR commands, configurable thread pools, and JMX monitoring to achieve high performance and reliability.

dbaplus Community
dbaplus Community
dbaplus Community
Designing a Scalable Asynchronous Batch Processing Framework for Massive User Data

Background

With the rapid growth of internet information technology, the era of massive data explosion has arrived. Telecom operators must process and analyze huge volumes of user data quickly, otherwise they risk being eliminated by the market.

Problem Statement

The author’s company stores millions of mobile users across provinces in a single Oracle relational table. During the beginning of each month, data volume spikes dramatically, causing a traditional C++ multi‑process backend to become a bottleneck. The system experiences delayed HLR (Home Location Register) suspend/resume commands, leading to customer complaints and potential churn.

Proposed Optimization

Horizontal sharding of the user table by city to reduce I/O pressure on any single database server.

Convert synchronous HLR command dispatch to an asynchronous callback model, allowing the processing module to continue without blocking.

Parallel loading of sharded data using a custom BatchQueryLoader that leverages Guava ListenableFuture for concurrent queries.

Configurable batch processing thread pool whose parameters (corePoolSize, maxPoolSize, workQueueSize, keepAliveTime) can be tuned for peak‑month load and scaled down during normal periods.

The overall component diagram is shown below:

Architecture diagram
Architecture diagram

Key Modules

BatchQueryLoader – Accepts multiple data source objects and uses Guava’s ListenableFuture to load data from each source in parallel, returning a combined result set.

BatchTaskReactor – Wraps ThreadPoolExecutor with a bounded ArrayBlockingQueue to dispatch the loaded data to asynchronous HLR tasks. It receives execution feedback via Future objects.

BatchTaskConfigurationLoader – Reads batchtask-configuration.xml to obtain thread‑pool parameters such as corePoolSize, maxPoolSize, workQueueSize, and keepAliveTime. The XML content defines the relationship between these values.

Thread pool configuration XML
Thread pool configuration XML

Thread‑Pool Policies – The framework supports the four standard ThreadPoolExecutor rejection policies: AbortPolicy, CallerRunsPolicy, DiscardPolicy, and DiscardOldestPolicy. Proper configuration of corePoolSize, maxPoolSize, and workQueueSize ensures the pool behaves predictably under load.

HlrBusinessEvent – Implements an interface for sending HLR suspend/resume commands. A proxy HlrBusinessEventAdvisor measures command latency.

NotifyUsersBatchTask – Orchestrates the whole flow: parallel data loading, asynchronous task submission, and result aggregation. It also records success and failure counts via NotifyTaskSuccCounter and NotifyTaskFailCounter.

Monitoring

Using Java Management Extensions (JMX), the BatchTaskMonitor MBean exposes methods such as getBatchTaskCounter(String name) to retrieve success or failure counters in real time. The author demonstrates connecting with JConsole and querying the TASKFAILCOUNTER, which reported 196 failed tasks during a test run.

Example Usage

The client code creates a NotifyUsersBatchTask instance, supplies database connection pools for the sharded notify_users tables, and triggers the batch process. After inserting 80 test records (e.g., for cities Fuzhou and Xiamen), the framework processes them, and the author shows screenshots of the execution results and JMX monitoring UI.

Conclusion

The presented asynchronous parallel batch framework, though concise, covers all essential components for high‑throughput, I/O‑intensive workloads. By exploiting multi‑core servers, horizontal sharding, configurable thread pools, and JMX monitoring, similar systems can achieve significant performance gains and better resource utilization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavaAsynchronousthread pool
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.