Design and Implementation of the 58 Group Penalty Data Center
This article presents the design, architecture, and implementation of a unified penalty data center for 58 Group, detailing the challenges of heterogeneous data sources, the selection of Flink for real‑time ETL, the use of a DSL and LRU aggregation, and the adoption of MVEL for feature recognition to achieve standardized, high‑performance penalty data processing.
58 Group, a leading Chinese lifestyle and classified information platform, generates massive daily data across recruitment, real estate, automotive, and other services, creating urgent needs for efficient content safety governance and penalty handling.
The existing landscape consisted of separate governance systems per business and a central risk control platform, leading to fragmented penalty data, inconsistent formats, and difficulty in unified appeal processing.
To address these issues, the Penalty Data Center was conceived to provide a standardized, centralized repository for all penalty information, supporting data retention, traceability, and unified management.
Solution Options
Two approaches were evaluated: (1) requiring all upstream services to emit standardized data, which would simplify the center but demand extensive upstream changes; (2) building the center as a base platform that ingests heterogeneous data, performs ETL, and stores standardized records, offering lower integration cost at the expense of more complex processing. The second option was chosen.
Architecture Overview
The system is divided into three layers:
Data Layer: Ingests real‑time streams, offline batches, and synchronized business data from various sources.
Service Layer: Handles data collection, rule‑based cleaning, and provides query and rollback capabilities.
Storage Layer: Utilizes the company’s custom wlist and wtable structures for high‑throughput writes and fast queries.
Data ETL Process
Data collection adapts to diverse upstream interfaces, ensuring efficient ingestion of both real‑time and batch data. After evaluating Spark, Flink, and the internal Dayu platform, Flink was selected for its sub‑second latency, flexible windowing, and strong streaming capabilities.
Transformation employs a custom DSL to map heterogeneous fields into a unified schema, using JSONPath expressions for value extraction.
Aggregation uses an LRU‑based mechanism to merge multiple penalty details into a single ticket, reducing database pressure and ensuring consistency across distributed nodes.
Feature Recognition
A rule engine based on MVEL was chosen after comparing Drools, Aviator, and MVEL, as MVEL offers expressive language features, extensibility, and acceptable performance for the project’s needs.
The engine extracts business‑specific features (e.g., scenario, data source, processing type) and, when necessary, invokes supplemental enrichment services to complete missing attributes.
Performance Optimizations
During operation, frequent Full GC events were traced to excessive logging and large ESB channel objects. By reducing unnecessary log statements and upgrading the ESB client, memory usage and GC pauses were significantly lowered, improving overall throughput.
Results
The Penalty Data Center now standardizes data from 36 sources, handling over 3 million penalty records per day, reducing new‑business integration effort from eight person‑days to one hour, and supporting real‑time data cleaning while providing a unified view for user appeals and downstream analytics.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
