Big Data 17 min read

How Turing Data Finder Transforms Growth Analysis with a Unified Data Platform

The article provides a detailed technical overview of the Turing Data Finder (TDF) platform, describing its background, core components, data schema, ingestion workflow, and a suite of growth‑analysis features such as event, retention, funnel, path, component, distribution, and attribution analysis, while also outlining performance‑optimisation techniques and future development directions.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
How Turing Data Finder Transforms Growth Analysis with a Unified Data Platform

Platform Background

In the digital era, enterprises need deep user‑behavior insights to drive growth. The legacy Baidu MEG big‑data products suffered from fragmented platforms, inconsistent quality, and poor usability, leading to high development overhead and slow business response. TDF (Turing Data Finder) was built to address these shortcomings by offering a unified, end‑to‑end data‑insight solution.

Core Ecosystem (Turing 3.0)

TDF consists of three main modules:

TDE (Turing Data Engine) : the compute engine built on Spark and ClickHouse.

TDS (Turing Data Studio) : a one‑stop data development and governance platform.

TDA (Turing Data Analysis) : the next‑generation visual BI product focused on growth analysis.

These modules replace the older, scattered MEG components and provide a consistent data lifecycle from ingestion to visualization.

Data Schema

event_day

(Date) – day‑level partition, required, e.g., 2024-06-01. event_hour (Int8) – hour‑level partition, required, e.g., 1. event (LowCardinality(String)) – event name, required, e.g., AppStart. properties (Map(String, String)) – event attributes, required, JSON‑like, e.g., {"event_type":"click","app_id":10001}. timestamp (DateTime) – event timestamp, required, e.g., 2024-06-01 00:00:00. distinct_id (String) – unique user identifier, required. person_properties (Map(String, String)) – user attributes, required. topic (LowCardinality(String)) – business line, required. appname (LowCardinality(String)) – product line, required. distinct_map (UInt64) – optimized mapping for retention calculations.

Data Ingestion Workflow

For logs from the central log platform, the workflow is:

User selects pages to sync in TDF.

TDF periodically syncs the corresponding event meta data.

TDF outputs the synced meta data to the data‑RD team.

Data‑RD processes the raw logs and writes formatted records to ClickHouse.

For non‑log‑platform sources, users must provide a fixed‑format meta file.

Growth‑Analysis Features

Event Analysis

Supports attribute filtering, grouping, multi‑metric calculation (PV, UV, per‑user count, distinct counts, sums), and a variety of chart types (line, bar, area, pie). Users can also view cohort performance across events.

Retention Analysis

Measures user stickiness over n days, with options for daily or cumulative retention, and supports custom start and return events.

Funnel Analysis

Allows definition of ordered events, attribute filters, and grouping to view conversion steps and trends, with options to compare against previous periods.

User Path

Visualizes user flow across selected events, showing transition probabilities and identifying bottlenecks.

Component Analysis

Displays attribute distribution (e.g., device brand, age) for a target cohort, supporting up to five attributes and optional contrast groups.

Distribution Analysis

Shows user count distribution across custom or algorithm‑generated intervals (e.g., Sturges), with flexible binning and metric selection.

Attribution Analysis

Evaluates the contribution of multiple source events to a target conversion event using Pearson correlation, offering first‑touch, last‑touch, and linear attribution models with adjustable windows.

Performance Optimisation

Data Model Simplification : pre‑join user and event details during ingestion; partition data by business line.

Query Logic Optimisation : parallelise sub‑queries at the smallest granularity; use RoaringBitmap for fast set intersections in retention calculations.

Materialisation & Caching : materialise high‑frequency map attributes; combine trigger‑based and scheduled caches to accelerate frequent reports.

Future Outlook

Planned enhancements include AI‑driven interactive insights, tighter integration with TDA dashboards and other platforms (e.g., human‑machine), expanded data source connectors, new analysis scenarios such as LTV and ahamoment, and continued performance improvements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringBig DataData PlatformSQL Optimizationuser behavior analysisgrowth analyticsTuring Data Finder
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.