Unlocking 10× Speed: Inside the 星图 Platform’s Modular Backend Architecture
This article details how the 星图 platform evolved from a monolithic customer‑service system into a modular, configurable backend architecture, covering its API management capabilities, process orchestration, caching strategies, JVM tuning, concurrency model, code optimizations, and the resulting performance gains and business benefits.
Introduction
The 星图 platform was built to support a rapidly expanding customer‑service system, requiring fast API integration, complaint handling, progress tracking, and feedback collection. To meet business growth, the platform evolved toward a modular and configurable architecture.
Project Background
Business interfaces are primarily connected via APIs. An API management platform was created to allow configuration personnel to select and bind interfaces without repeated development, achieving centralized management, reuse, and reduced maintenance costs.
Core Features
API Management Supports all internal HTTP/SOA interfaces, version control, built‑in Groovy script editing, field derivation, and tenant isolation.
Process Orchestration Allows custom workflows, parallel processing, black‑box encapsulation, and nested processes, exposing them as ordinary APIs.
Architecture Design
The system is divided into modules to reduce strong dependencies and ensure stability.
Modularization
Configuration Module Independent service for rapid configuration deployment during traffic spikes.
Runtime Module Handles business execution with minimal interference from other modules.
Execution Log Asynchronous messaging decouples interface calls from log persistence.
Lightweight Design
Prioritizes core functions such as parameter validation, preprocessing, data assembly, and request handling, while using a custom lightweight workflow engine.
Extensibility
Plugin‑based design with automatic registration (SPI‑like) for listeners, executors, and the execution engine, enabling easy extension.
Challenges
During early stages the platform faced database pressure, high CPU usage, slow interface response times, performance jitter, and stability concerns due to Groovy scripts.
Performance Optimizations
Data Storage
Runtime call records are stored in Elasticsearch; hot data resides in MySQL with asynchronous persistence via MQ. Message partitioning and data merging reduce lock contention.
Cache Strategy
Distributed Cache Redis provides high‑performance, high‑availability caching with large capacity and data consistency.
Cache Degradation Falls back to MySQL when Redis is unavailable, with rate limiting and circuit breaking.
In‑Memory Cache Local cache complements Redis, with capacity and TTL tuning to avoid OOM.
Cache Storage Optimization Switches from JSON to Protostuff binary format, halving size and doubling speed.
Concurrency Model
Independent nodes without data dependencies are processed in parallel using CompletableFuture, improving throughput.
Code Optimization
// Example method
public LinkedList<AbsAutoRegisterProcessor> getProcessors(String bizScope, String bizType) {
ProcessorBO processorBO = ProcessorBO.builder()
.bizScope(bizScope)
.bizType(bizType)
.build();
ValidationUtils.validateThrowException(processorBO);
return getAbsAutoRegisterProcessors(processorBO);
}
// ValidationUtils method
public static void validateThrowException(@Valid Object object) throws ServiceException {
Set<ConstraintViolation<Object>> validateSet = Validation.buildDefaultValidatorFactory()
.getValidator().validate(object, new Class[0]);
if (!CollectionUtils.isEmpty(validateSet)) {
String messages = validateSet.stream()
.map(ConstraintViolation::getMessage)
.reduce((m1, m2) -> m1 + ";" + m2)
.orElse("参数输入有误!");
throw new ServiceException(messages);
}
}Profiling revealed that repeated validator factory creation caused significant latency; caching the factory reduced validation time to microseconds.
JVM Tuning
GC Pause Reduction Set -XX:MaxGCPauseMillis=50ms to mitigate long GC pauses.
Humongous Allocation Adjusted -Xmx and -XX:G1HeapRegionSize to lower the frequency of large object allocations.
Stability Design
Node Execution Limits Maximum execution time per node (e.g., 3 s) with monitoring and alerts.
Groovy Script Control Whitelist of allowed classes/methods and sandbox execution to ensure safety.
Project Benefits
The platform now supports over 100 business services, 400+ APIs, 70+ custom workflow APIs, and processes more than 3 million calls daily. Machine count was halved, overall response time dropped by ~66 %, and a typical workflow with 200+ nodes runs in ~500 ms versus >5 s previously—a tenfold improvement.
Business Impact
Significant reductions in development effort (30‑50 % fewer test resources), accelerated delivery, and cost savings across customer‑service tickets, intelligent dialogue, and priority routing scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
