MobTech Data Governance and Security Practices: Architecture, Implementation, and Financial Industry Use Cases
This article presents MobTech’s comprehensive data governance and security practices, covering the necessity of governance, its benefits, a full‑chain governance framework, specific challenges in the financial sector, the evolution of their integrated architecture, and detailed implementations of security, model, asset, monitoring, and quality management systems.
Overview
The article shares MobTech’s practical experiences in data governance and data security, outlining the importance of governance, its benefits, and the key components of a systematic, modular, and engineering‑driven approach.
Data Governance Overview
Data governance encompasses the entire lifecycle of data—from collection and integration to analysis, management, evaluation, guidance, and supervision (EDM). It addresses issues such as data silos, redundancy, complex requirements, low quality, and difficulty in extracting value.
Benefits of Data Governance
Reduces enterprise costs by eliminating storage redundancy and optimizing compute resources.
Strengthens data security and compliance with regulations such as GDPR and China’s Data Security Law.
Improves data quality, ensuring timely and accurate delivery for latency‑sensitive applications.
Enhances data value through proper modeling, cleaning, and mining.
Full‑Chain Governance Process
Data collection – ensure standards, compliance, and proper de‑identification.
Data storage – guarantee security, timeliness, and integrity.
Data analysis – verify model accuracy and compute adequacy.
Data output – enforce permission control and risk assessment.
Financial Industry Context
MobTech’s financial data platform processes hundreds of petabytes, with daily active users in the hundreds of millions. Financial risk assessment demands strict compliance, extreme timeliness, consistent historical data, and high accuracy.
Integrated Data Governance Architecture
MobTech’s architecture has evolved from a rudimentary, open‑source‑based setup to a mature five‑system platform covering security management, asset management, data quality, model management, and task monitoring, ensuring service‑level agreements across the data lifecycle.
Security Management System
Data desensitization from source.
Privacy masking.
Secure transmission with encryption.
Comprehensive data monitoring (level‑based, anomaly, end‑to‑end).
Permission management based on data classification.
Formal security approval workflow.
Model Management System
Model creation – requirement gathering, design, and development standards.
Model validation – automated scoring against design rules.
Model review – human approval before deployment.
Model maintenance – version control and documentation.
Asset Management System
Asset panorama – usage statistics, cost accounting, and scoring.
Lineage – custom Hive hooks and Spark integration for full‑traceability.
Metadata – table and cluster information for governance decisions.
Asset registration – unified management across departments.
Monitoring and Alert System
Built on Apache DolphinScheduler, the system provides task scheduling, monitoring, alerting, and governance, including long‑tail detection, resource usage analysis, and automated task scoring.
Data Quality Monitoring System
The QC platform offers rule management, configuration, monitoring, and panoramic reporting, featuring automatic data‑flow throttling (circuit‑break) and multi‑channel alerts (phone, email) to ensure timely and reliable data delivery.
Q&A Highlights
Financial data is stored in HBase and ClickHouse, with real‑time updates via Flink and T+1 offline processing.
MobTech handles daily data volumes of hundreds of terabytes, facing challenges such as data skew and timely processing.
Automation is feasible for stable data sources but requires manual oversight for volatile streams.
Overall, the article demonstrates how a large‑scale data company can build a comprehensive, end‑to‑end data governance framework that balances cost efficiency, security, quality, and business value.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
