Big Data 13 min read

Design and Implementation of the 58 Group Penalty Data Center

This article presents the design, architecture, and implementation of a unified penalty data center for 58 Group, detailing the challenges of heterogeneous data sources, the selection of Flink for real‑time ETL, the use of a DSL and LRU aggregation, and the adoption of MVEL for feature recognition to achieve standardized, high‑performance penalty data processing.

58 Tech

Mar 29, 2022

Design and Implementation of the 58 Group Penalty Data Center

58 Group, a leading Chinese lifestyle and classified information platform, generates massive daily data across recruitment, real estate, automotive, and other services, creating urgent needs for efficient content safety governance and penalty handling.

The existing landscape consisted of separate governance systems per business and a central risk control platform, leading to fragmented penalty data, inconsistent formats, and difficulty in unified appeal processing.

To address these issues, the Penalty Data Center was conceived to provide a standardized, centralized repository for all penalty information, supporting data retention, traceability, and unified management.

Solution Options

Two approaches were evaluated: (1) requiring all upstream services to emit standardized data, which would simplify the center but demand extensive upstream changes; (2) building the center as a base platform that ingests heterogeneous data, performs ETL, and stores standardized records, offering lower integration cost at the expense of more complex processing. The second option was chosen.

Architecture Overview

The system is divided into three layers:

Data Layer: Ingests real‑time streams, offline batches, and synchronized business data from various sources.

Service Layer: Handles data collection, rule‑based cleaning, and provides query and rollback capabilities.

Storage Layer: Utilizes the company’s custom wlist and wtable structures for high‑throughput writes and fast queries.

Data ETL Process

Data collection adapts to diverse upstream interfaces, ensuring efficient ingestion of both real‑time and batch data. After evaluating Spark, Flink, and the internal Dayu platform, Flink was selected for its sub‑second latency, flexible windowing, and strong streaming capabilities.

Transformation employs a custom DSL to map heterogeneous fields into a unified schema, using JSONPath expressions for value extraction.

Aggregation uses an LRU‑based mechanism to merge multiple penalty details into a single ticket, reducing database pressure and ensuring consistency across distributed nodes.

Feature Recognition

A rule engine based on MVEL was chosen after comparing Drools, Aviator, and MVEL, as MVEL offers expressive language features, extensibility, and acceptable performance for the project’s needs.

The engine extracts business‑specific features (e.g., scenario, data source, processing type) and, when necessary, invokes supplemental enrichment services to complete missing attributes.

Performance Optimizations

During operation, frequent Full GC events were traced to excessive logging and large ESB channel objects. By reducing unnecessary log statements and upgrading the ESB client, memory usage and GC pauses were significantly lowered, improving overall throughput.

Results

The Penalty Data Center now standardizes data from 36 sources, handling over 3 million penalty records per day, reducing new‑business integration effort from eight person‑days to one hour, and supporting real‑time data cleaning while providing a unified view for user appeals and downstream analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data engineering big data Flink ETL Penalty System

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.