Information Security 15 min read

Evolution and Architecture of a Big Data‑Driven Security Portrait System at 58.com

The article details the design, multi‑stage evolution, and operational impact of a big‑data‑based security portrait platform built by 58.com, describing its data pipelines, real‑time risk tagging, strategy scheduling, configuration management, and overall architecture that enable large‑scale threat detection and mitigation.

58 Tech
58 Tech
58 Tech
Evolution and Architecture of a Big Data‑Driven Security Portrait System at 58.com

To help 58.com business units build an intelligent security defense network, a self‑developed big‑data‑based security portrait system was created as an analytical security management platform that integrates with the Hunter real‑time risk control platform, providing pre‑emptive intelligence alerts, real‑time risk identification, post‑incident case tracing, and third‑party data integration.

Evolution Process

The system started over two years ago as a few service interfaces and scripts, evolving into a full platform with a complete data closed‑loop, over 2 billion records across 10 dimensions and 200+ tags, handling more than 2 billion daily calls.

First Version

Initially designed to tag accounts, IPs, phone numbers, etc., the platform marked malicious resources to raise the cost for black‑market attacks. Data was extracted from purchased sources and 58.com traffic logs via SQL and scripts, then written to a tag library.

To support high‑concurrency queries, a sharded MongoDB cluster (5 shards, each with a 5‑node replica set) was deployed, delivering >1 TB storage, >100 k QPS, and <1 ms average latency.

First Evolution – Platformization

Rapid strategy deployment using OS cron jobs became unsustainable as tags grew, prompting the creation of a distributed offline task scheduling platform with three modules: analysis engine (containerized execution of Python, Java, HiveSQL scripts), ingestion engine (parsing and writing tags), and a scheduling bus that tracks task states and provides a visual monitoring UI.

Second Evolution – Improving Data Effectiveness

Real‑time strategies were introduced to address the timeliness issue of offline tags, requiring a unified real‑time data ingestion and standardization platform, a visual rule‑engine for configurable expressions, and a decision engine that classifies behavior into categories such as black‑market, machine, abnormal user, etc.

Multi‑dimensional analysis and graph‑based anomaly clustering were added to detect malicious traffic, with models deployed in production to tag suspicious registrations.

Third Evolution – Resource Sharing

Core components were abstracted into reusable services: a generic rule engine, a data flow platform (supporting Kafka, WMB, Hive, HBase storage, and replay), and an open blacklist repository for cross‑business sharing.

Overall Architecture

The platform is divided into five domains—Capability, Operations, Support, Common, and Base Data—each independently scalable, providing 20 billion daily accesses, unified metadata management, and high‑performance query services.

Business Impact

Since launch, the system has contributed to 31% of anti‑porn efforts in full‑time recruitment, 79% of anti‑spam in rental listings, and 99% in Ganji WeChat anti‑spam, intercepting roughly 200 k malicious users with an average interception rate of 22%.

Author

Lü Fang, Senior Backend Development Engineer, Security Platform Department, 58.com.

risk managementbig datareal-time analyticsPlatform ArchitectureSecurity
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.