Big Data 20 min read

How a Next‑Gen Data Management Platform Boosts Efficiency and Innovation

This article outlines the motivations, objectives, and architectural design of a next‑generation data management platform, detailing its four‑layer “four‑ization” approach, core services such as data integration, modeling, API provisioning, componentization, as well as governance, security, and operational best practices.

Data Thinking Notes

Nov 5, 2024

How a Next‑Gen Data Management Platform Boosts Efficiency and Innovation

01 Construction Background

Data is scattered across multiple storage environments and databases, requiring manual synchronization, consolidation, and merging for new business needs, leading to wasted resources and manpower. The current system architecture cannot support future data‑driven business innovation.

A new generation data management platform is needed to improve data utilization efficiency and support business development, offering capabilities for data aggregation, analysis, application, computation, management, and resource management.

02 Construction Goals

1. Improve business product development efficiency By unifying data and services, the platform reduces duplicated development, heavy operational costs, and the need for deep framework knowledge across departments.

2. Enhance business demand response capability Visual API development interfaces and component service libraries enable rapid development of new business features, reducing time‑to‑market.

3. Enable data to empower business innovation Deep data value extraction through modeling supports decision‑making and intelligent products, with data models forming core assets managed via a visual interface and standardized service APIs.

03 Application Platform “Four‑ization” Construction

1. Data aggregation visualization A visual management system replaces manual data sync tools, handling data collection and scheduling.

2. Self‑service data modeling Users can create new data service units by integrating data from multiple sources, forming a model marketplace with risk, predictive, and marketing models.

(1) Data model marketplace

Provides shared data models for analysts and authorized users, supporting various scenarios such as risk assessment and marketing analysis.

(2) Simple analysis model building

Users select data sources, define relationships, and generate new data units through a visual interface.

(3) Complex analysis model building

Data engineers log in, create projects, and request data resources.

They browse metadata, select tables or interfaces, and submit requests.

Requests may involve dozens of tables with batch or real‑time usage.

Standardized masking and encryption policies are applied, and administrators approve.

Approved resources are prepared for self‑service queries, development, configuration, and SQL orchestration.

Final self‑service reports or dashboards are delivered to users.

3. Data service API enablement The platform can encapsulate any table or view as an API, supporting both simple and complex services.

(1) Building simple data service APIs

Use wizard mode to create an API, entering basic information such as name, path, and protocol.

Select data source, database, and table; the system displays schema information.

Choose input and output fields, set query conditions, pagination, and filters.

Validate and publish the API.

(2) Building complex data service APIs

Use script mode to create an API, similar to simple queries.

Select data source and write a full SQL statement supporting joins, aliases, and functions.

The system extracts input and output parameters for further configuration.

Validate and publish the API.

4. Business service componentization High‑overlap business functions are packaged into domain service components or micro‑services, which invoke multiple APIs and aggregate data for front‑end consumption.

04 Basic Platform Construction

The platform must align with enterprise strategy, ensure stability and scalability, and design accurate, efficient dimensional models and data processing pipelines. A tiered architecture (ODS, DW, DM) is recommended.

Data integration consolidates data from disparate databases into the warehouse, supporting full, incremental, and change‑capture tables.

Full tables store complete data.

Incremental tables store newly added data.

Change tables store new and modified data.

Link tables periodically merge change tables.

Entity tables (e.g., users, products) are typically refreshed daily in full.

Dimension tables (e.g., order status, product categories) are also refreshed daily, while static dimensions (e.g., gender, region) may remain unchanged.

Fact tables are handled differently: transactional facts (e.g., transaction logs) are incrementally loaded daily; periodic facts (e.g., order applications) follow their own schedules.

Data storage layers include:

ODS layer: raw data storage.

DWD layer: cleansed data in Hive.

DWS layer: lightly aggregated data in Hive.

ADS layer: data applications stored in Elasticsearch or MySQL.

Metadata is managed via Atlas for Hive, creating a unified metadata catalog and supporting data lineage, quality checks, and governance.

Data computation is performed by administrators, supporting offline (Hive jobs) and real‑time (Elasticsearch, Doris) queries, with tasks orchestrated by Airflow.

Task scheduling manages data sync, offline computation, and automatic publishing of results.

Data application development allows developers or analysts to request data resources, select sources and fields, and receive data via visual browsing or API interfaces.

Data modeling and analysis enable business users to request data themes, with pre‑built charts and formulas available in the analytics platform.

Data security is enforced through authentication (Kerberos), authorization (role‑based access), and auditing (activity logging, alerts).

05 Management Specification Construction

Standardized naming, layering, and security policies ensure data lineage clarity and prevent unauthorized DDL operations.

06 Construction Experience Sharing

Key lessons include ensuring data synchronization consistency (incremental, state change, loss prevention), real‑time capture via log‑based methods, and comprehensive data quality management across modeling, platform technology, and process governance.

big data data platform Data Warehouse Data integration data governance

Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.