How Turing Data Finder Transforms Growth Analysis with a Unified Data Platform
The article provides a detailed technical overview of the Turing Data Finder (TDF) platform, describing its background, core components, data schema, ingestion workflow, and a suite of growth‑analysis features such as event, retention, funnel, path, component, distribution, and attribution analysis, while also outlining performance‑optimisation techniques and future development directions.
Platform Background
In the digital era, enterprises need deep user‑behavior insights to drive growth. The legacy Baidu MEG big‑data products suffered from fragmented platforms, inconsistent quality, and poor usability, leading to high development overhead and slow business response. TDF (Turing Data Finder) was built to address these shortcomings by offering a unified, end‑to‑end data‑insight solution.
Core Ecosystem (Turing 3.0)
TDF consists of three main modules:
TDE (Turing Data Engine) : the compute engine built on Spark and ClickHouse.
TDS (Turing Data Studio) : a one‑stop data development and governance platform.
TDA (Turing Data Analysis) : the next‑generation visual BI product focused on growth analysis.
These modules replace the older, scattered MEG components and provide a consistent data lifecycle from ingestion to visualization.
Data Schema
event_day(Date) – day‑level partition, required, e.g., 2024-06-01. event_hour (Int8) – hour‑level partition, required, e.g., 1. event (LowCardinality(String)) – event name, required, e.g., AppStart. properties (Map(String, String)) – event attributes, required, JSON‑like, e.g., {"event_type":"click","app_id":10001}. timestamp (DateTime) – event timestamp, required, e.g., 2024-06-01 00:00:00. distinct_id (String) – unique user identifier, required. person_properties (Map(String, String)) – user attributes, required. topic (LowCardinality(String)) – business line, required. appname (LowCardinality(String)) – product line, required. distinct_map (UInt64) – optimized mapping for retention calculations.
Data Ingestion Workflow
For logs from the central log platform, the workflow is:
User selects pages to sync in TDF.
TDF periodically syncs the corresponding event meta data.
TDF outputs the synced meta data to the data‑RD team.
Data‑RD processes the raw logs and writes formatted records to ClickHouse.
For non‑log‑platform sources, users must provide a fixed‑format meta file.
Growth‑Analysis Features
Event Analysis
Supports attribute filtering, grouping, multi‑metric calculation (PV, UV, per‑user count, distinct counts, sums), and a variety of chart types (line, bar, area, pie). Users can also view cohort performance across events.
Retention Analysis
Measures user stickiness over n days, with options for daily or cumulative retention, and supports custom start and return events.
Funnel Analysis
Allows definition of ordered events, attribute filters, and grouping to view conversion steps and trends, with options to compare against previous periods.
User Path
Visualizes user flow across selected events, showing transition probabilities and identifying bottlenecks.
Component Analysis
Displays attribute distribution (e.g., device brand, age) for a target cohort, supporting up to five attributes and optional contrast groups.
Distribution Analysis
Shows user count distribution across custom or algorithm‑generated intervals (e.g., Sturges), with flexible binning and metric selection.
Attribution Analysis
Evaluates the contribution of multiple source events to a target conversion event using Pearson correlation, offering first‑touch, last‑touch, and linear attribution models with adjustable windows.
Performance Optimisation
Data Model Simplification : pre‑join user and event details during ingestion; partition data by business line.
Query Logic Optimisation : parallelise sub‑queries at the smallest granularity; use RoaringBitmap for fast set intersections in retention calculations.
Materialisation & Caching : materialise high‑frequency map attributes; combine trigger‑based and scheduled caches to accelerate frequent reports.
Future Outlook
Planned enhancements include AI‑driven interactive insights, tighter integration with TDA dashboards and other platforms (e.g., human‑machine), expanded data source connectors, new analysis scenarios such as LTV and ahamoment, and continued performance improvements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
