Design and Implementation of JD Daojia Security Operations Center (SOC) Platform
This article details the challenges, design choices, deployment steps, detection model creation, data processing, visualization, and future plans of JD Daojia's security operations platform, highlighting the use of Graylog, Elasticsearch, and MongoDB to achieve scalable, real‑time threat detection and response.
1. Introduction
Security operation focuses on assets and uses security event management as a core process, relying on a security operations platform to build real‑time asset risk models, perform event analysis, risk analysis, warning management, and emergency response, thereby ensuring the safe operation of enterprise systems and services.
JD Daojia faces massive malicious network attacks; disparate alerts from various security devices lead to mis‑detections and delayed response. Building a SOC aims to collect and correlate logs from different business systems and security devices, improve detection efficiency, and achieve closed‑loop network security management.
2. Challenges
The platform must align with JD Daojia's business direction and existing resources, demanding high scalability. Log data sources must satisfy attack‑chain detection needs, and existing security devices must support data aggregation and correlation.
Ensuring data sources meet attack‑chain detection requirements.
Making the platform’s security detection capabilities meet business requirements.
Providing collaborative analysis capabilities.
3. Security Operations Platform Overview
The platform uses business logs as data sources to detect attacks and must handle billions of log entries. An open‑source log analysis platform was selected for further development.
3.1 Open‑Source Log Analysis Platform Selection
Three mainstream platforms—ELK, Loki, and Graylog—were compared.
Platform
Analysis
ELK
Open‑source suite (Elasticsearch, Logstash, Kibana) supporting multi‑source log collection, distributed search, and visual UI.
Loki
Easy to operate, does not index logs fully, uses Prometheus‑style labels, suitable for Kubernetes pod logs.
Graylog
Integrated deployment, supports multi‑source log collection, field modification, TB‑level queries, archiving, superior alerting, and Python library support.
Graylog was chosen because it offers better alerting and meets the platform’s requirements.
3.2 Platform Design
Based on Graylog, the platform consists of four modules: data source, data storage, detection & analysis, and visualization.
3.2.1 Data Source Module
Collects logs from infrastructure (web access, host logs) and security devices (WAF, HIDS, firewall alerts).
3.2.2 Data Storage Module
Uses Elasticsearch to store collected logs and MongoDB to store Graylog operation logs.
3.2.3 Detection & Analysis Module
Core module containing rule engine, analysis engine, and alarm engine.
Rule Engine: Matches logs against defined security detection rules.
Analysis Engine: Processes matched data for secondary analysis and storage.
Alarm Engine: Provides alert notification capabilities within the platform.
3.2.4 Visualization Module
Displays abnormal scenarios identified by the detection module, including alarm data flow, dashboards, and threat posture views.
4. Platform Construction
Construction includes Graylog deployment, log ingestion, detection model generation, alarm data processing, and visualization.
4.1 Graylog Deployment
Graylog forms the core infrastructure, deployed alongside Elasticsearch and MongoDB.
4.2 Log Ingestion
Infrastructure logs (e.g., web access) are forwarded via Graylog Agent to Elasticsearch. Security device logs (WAF, HIDS, firewall) are collected via device‑specific interfaces and normalized.
Unified log ingestion enables complete attack‑chain reconstruction for threat tracing.
4.3 Detection Model Generation
Detection models consist of rules for various attack behaviors. Graylog’s rule engine processes massive log streams, extracting URLs, headers, bodies, status codes, etc., to reduce false positives.
Different data sources require tailored rules to avoid duplicate detection, false alarms, or missed alerts.
Web logs (NGINX) are analyzed for both network attacks and malicious business behavior (e.g., fraudulent orders). Distinguishing logged‑in vs. non‑logged‑in attacks helps prioritize response.
Security device logs (cloud WAF, local devices) are aggregated, filtered, and correlated with web logs to reduce false positives and enable joint analysis.
4.4 Data Processing
After rule‑engine detection, alerts are enriched and stored by the analysis engine.
Enriched alerts are pushed via Enterprise WeChat for immediate analyst response.
4.5 Visualization
Processed data is displayed through dashboards, showing web and device alerts, enabling rapid attack chain visualization.
Each alert can be queried by IP, device fingerprint, or time to trace the full attack chain.
4.6 Detection Workflow
The overall workflow includes Graylog deployment, log forwarding, model detection, data enrichment, and visualization, as illustrated below.
Logs are normalized, stored, matched against attack rules, labeled, visualized, and sent as WeChat work orders for incident handling.
5. Achievements and Future Plans
5.1 Achievements
The platform enables threat management, event correlation, and work‑order management, providing comprehensive security capabilities.
Threat management: centralized view of all threat alerts.
Event correlation: joint analysis of business system and device alerts for attack tracing.
Work‑order management: automatic WeChat notifications and consolidated view of all alerts.
5.2 Future Plans
Future work includes automated asset discovery and integration, and automated attack‑chain identification using correlated logs, timestamps, IPs, and fingerprints.
Asset linkage automation: script‑driven onboarding of new network assets.
Automated attack‑chain detection: combine platform alerts with host logs to auto‑generate full attack chains.
6. Conclusion
The security operations system gives JD Daojia clear insight into its security posture, supports stable business operation, and will continue to evolve to meet emerging threats through iterative optimization.
Dada Group Technology
Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.