Big Data 8 min read

Approaches to Building a Basic Data Platform

To handle terabytes of daily data and diverse business needs, the company built a three‑layer basic data platform—collection/computation/storage, unified data management, and API‑driven services—augmented by a standardized collection system, a robust Domino scheduler, and a self‑service analysis tool, aiming to evolve into a full data‑middle‑office for end‑to‑end intelligence.

37 Interactive Technology Team
37 Interactive Technology Team
37 Interactive Technology Team
Approaches to Building a Basic Data Platform

Our company generates terabytes of data daily. The main challenges are how to organize, store, and exploit this data efficiently, and how to meet diverse data needs from operations, marketing, and customer service.

The basic data platform is designed to satisfy evolving business requirements while ensuring high performance, scalability, and flexibility.

Overall Architecture

The platform consists of three major layers:

Basic Data Platform Layer : Handles data collection, computation, and storage. It collects two types of data—behavior and attributes—and performs ETL processing before storing them in various storage systems.

Data Management Layer : Provides unified management of data assets, generating business‑relevant dimensions and metrics. Examples include LTV prediction services based on multivariate regression, chat moderation systems using deep learning, and intelligent SDK activity recommendation systems.

Data Service Layer : Exposes data through API services, enabling visualization products or AI‑driven data products.

Key Sub‑systems

1. Unified Collection Platform

Background: Historical data collection suffered from inconsistent interfaces, undocumented fields, and lack of standards.

Solution: A unified, standardized, and systematic collection platform (version 1.0) was built, featuring an API Gateway entry point and a custom Lua→Redis/Lua→Kafka pipeline to ensure high performance, high availability, and data integrity.

Results (as of March 1): Over 1 billion total data entries, >20 million daily entries, and average response time < 20 ms.

2. Domino Scheduling Platform

Pain points of traditional Crontab‑based scheduling include time‑based dependencies, lack of concurrency, missing dependency management, and chaotic monitoring.

Solution: A dedicated scheduling platform that supports multiple component types (Hive, PHP, Java, Shell, MapReduce), custom upstream/downstream dependencies, task timing, and history management.

Effect: Handles more than 10 000 jobs per day, provides a user‑friendly interface for job result management, and standardizes daily development workflows.

3. Self‑Service Analysis Platform

Pain points: Diverse business lines generate numerous ad‑hoc data analysis requests.

Solution: A self‑service analysis platform that enables business users to create visualizations in three steps—drag dimensions/metrics to the design area, select a chart type, and query.

The system is currently in gray‑release, allowing users to perform independent data analysis and reducing the burden on developers.

Future Planning

The basic data platform aims to become the company’s “data middle‑office,” offering end‑to‑end services such as data collection, modeling, computation, governance, asset management, and data services, ultimately forming a comprehensive data intelligence platform.

big datadata platformSchedulingdata integrationdata architectureself‑service analytics
37 Interactive Technology Team
Written by

37 Interactive Technology Team

37 Interactive Technology Center

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.