Big Data 8 min read

Approaches to Building a Basic Data Platform

To handle terabytes of daily data and diverse business needs, the company built a three‑layer basic data platform—collection/computation/storage, unified data management, and API‑driven services—augmented by a standardized collection system, a robust Domino scheduler, and a self‑service analysis tool, aiming to evolve into a full data‑middle‑office for end‑to‑end intelligence.

37 Interactive Technology Team

Mar 28, 2019

Approaches to Building a Basic Data Platform

Our company generates terabytes of data daily. The main challenges are how to organize, store, and exploit this data efficiently, and how to meet diverse data needs from operations, marketing, and customer service.

The basic data platform is designed to satisfy evolving business requirements while ensuring high performance, scalability, and flexibility.

Overall Architecture

The platform consists of three major layers:

Basic Data Platform Layer : Handles data collection, computation, and storage. It collects two types of data—behavior and attributes—and performs ETL processing before storing them in various storage systems.

Data Management Layer : Provides unified management of data assets, generating business‑relevant dimensions and metrics. Examples include LTV prediction services based on multivariate regression, chat moderation systems using deep learning, and intelligent SDK activity recommendation systems.

Data Service Layer : Exposes data through API services, enabling visualization products or AI‑driven data products.

Key Sub‑systems

1. Unified Collection Platform

Background: Historical data collection suffered from inconsistent interfaces, undocumented fields, and lack of standards.

Solution: A unified, standardized, and systematic collection platform (version 1.0) was built, featuring an API Gateway entry point and a custom Lua→Redis/Lua→Kafka pipeline to ensure high performance, high availability, and data integrity.

Results (as of March 1): Over 1 billion total data entries, >20 million daily entries, and average response time < 20 ms.

2. Domino Scheduling Platform

Pain points of traditional Crontab‑based scheduling include time‑based dependencies, lack of concurrency, missing dependency management, and chaotic monitoring.

Solution: A dedicated scheduling platform that supports multiple component types (Hive, PHP, Java, Shell, MapReduce), custom upstream/downstream dependencies, task timing, and history management.

Effect: Handles more than 10 000 jobs per day, provides a user‑friendly interface for job result management, and standardizes daily development workflows.

3. Self‑Service Analysis Platform

Pain points: Diverse business lines generate numerous ad‑hoc data analysis requests.

Solution: A self‑service analysis platform that enables business users to create visualizations in three steps—drag dimensions/metrics to the design area, select a chart type, and query.

The system is currently in gray‑release, allowing users to perform independent data analysis and reducing the burden on developers.

Future Planning

The basic data platform aims to become the company’s “data middle‑office,” offering end‑to‑end services such as data collection, modeling, computation, governance, asset management, and data services, ultimately forming a comprehensive data intelligence platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scheduling Data Integration Data Architecture Self-Service Analytics

Written by

37 Interactive Technology Team

37 Interactive Technology Center

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.