Evolution and Practice of 360 Big Data Center Platform
The article presents a comprehensive overview of 360's Big Data Center evolution, covering business background, platform‑as‑a‑service architecture, data asset management, user‑profile unification, platform milestones, technical architecture, performance optimizations, online query capabilities, future plans, and a Q&A session.
This article is based on the speech of Xu Hao, Technical Director of 360 Big Data Center, at the 2018 DAMS China Data Asset Management Summit and originally published on the DBAplus community.
Business Background : Established in 2008, the 360 data center originally handled simple, repetitive data analysis and processing tasks for various products. Rapid product expansion (security, browsers, search, IoT, video, games) led to over 70 active products, 1,000+ tables, 30,000+ fields, and a data volume approaching 1.6 EB with PB‑level daily growth, creating pressure for a more scalable solution.
Business Needs :
Secondary data processing for massive user‑behavior logs.
Fast retrieval of large‑scale security data.
Product‑level data analysis for operation and reporting.
Platform Evolution – Three Stages :
Before 2010: Product‑centric, isolated data tools.
2010‑2015: Centralized data‑processing department, building reusable capabilities.
Since 2015: Unified big‑data platform to serve all products without linear staff growth.
Key milestones include the first MR program (2010), full migration to distributed processing (2011), mobile SDK release (2011‑2015), real‑time computing (2016), and the first end‑to‑end platform version (Dec 2017).
QDAS+ Platform : Positioned as a one‑stop data‑governance, processing, and mining platform. It consists of four layers – data ingestion, basic platform (compute, storage, middleware), application platform (scheduling, collection, reporting, rule engine, permission), and external services/products.
Technical Architecture :
Eight subsystems: TITAN (data processing), QDAM (asset management), QMiner (ML), Qreport (reporting), Qprofile (knowledge), Qnote (online query), QOPS (operations), and data‑open services.
Data flows through TITAN for second‑stage processing, then into unified assets managed by QOPS.
Four data layers – raw, detail, aggregation, and application.
Data Asset & User Profile : Unified asset catalog across 70+ products, visual data‑value map, and a company‑wide security grading system. A virtual user‑ID is built by linking cross‑product behavior data, enabling cross‑product user profiling.
Data Platform Improvements :
Cross‑engine computation (Spark + Flink).
Hybrid data‑source input.
Graphical task configuration replacing rigid templates.
Performance optimizations: data skew mitigation, caching, small‑file merging.
Scenario enhancements: task debugging, exception strategies, default‑value filling.
Online Query : Provides ad‑hoc SQL‑based analysis for analysts and product engineers, complementing the data‑processing and reporting platforms. Architecture includes syntax validation, semantic translation, and multi‑language execution layers.
Current Status & Future Plans : The platform now serves 35 business lines with plans to migrate all legacy tasks. Ongoing work focuses on data‑lifecycle operations and delivering solution‑oriented services for cross‑product needs.
Q&A :
Small‑file merging is handled by consolidating output files based on business‑driven size thresholds.
Both Spark and Flink are retained; Spark for batch workloads, Flink for real‑time scenarios.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.