Big Data 8 min read

Design and Implementation of the Compass Flow Analysis Platform at Beike

This article describes the background, challenges, overall architecture, data ingestion, processing, storage, and analytical capabilities of Beike's Compass platform, which leverages Spark, ClickHouse, and custom algorithms to provide real‑time and offline big‑data analytics for multiple business lines.

Beike Product & Technology
Beike Product & Technology
Beike Product & Technology
Design and Implementation of the Compass Flow Analysis Platform at Beike

Li Shichang, a senior big data development engineer at Beike, explains the motivation behind building the Compass flow analysis platform to provide unified, authoritative traffic data for the group and its business units.

Background: With the establishment of a growth line and increasing demand for data‑driven decision making, various business units need detailed traffic metrics, while senior management requires aggregated indicators such as monthly active users, conversion, retention, and channel performance.

Problems Faced: 1) Inconsistent logging standards across teams; 2) Divergent statistical definitions preventing unified reporting; 3) Daily terabyte‑scale data ingestion requiring both detailed storage and analytical processing.

Overall Design: The platform is divided into five vertical stages—data demand, data ingestion, data processing, data storage, and data analysis—while horizontally illustrating data flow through each stage.

Data Demand: Establish a company‑wide standard for event logging, supported by an event‑management module that handles log requests, generates documentation, and assists business teams in implementing standardized instrumentation.

Data Ingestion: The Dig service receives logs from APP, PC, and mobile sites, writes them to Kafka via a Lua program, and decompresses batch‑uploaded log files from mobile clients.

Data Processing: Spark jobs consume Kafka streams to clean data, convert legacy formats, parse fields, and enrich dimensions (e.g., device model). Cleaned data is written to ClickHouse for real‑time analytics and to HDFS for offline processing; custom Hive SQL handles specialized queries, and offline channel data is merged daily into ClickHouse, forming a Lambda architecture.

Data Storage: After evaluating Spark, Kylin, Druid, Kudu+Impala, and ClickHouse, ClickHouse was selected for its rich analytical functions, SQL support, high query performance, strong compression, and proven use cases in companies like Sina Weibo and Guazi.

Data Analysis: Five major analytical capabilities are provided:

Data Overview – visual dashboards for core traffic metrics.

Event Analysis – flexible dimension‑metric combinations using complex aggregation SQL.

Funnel Analysis – custom ClickHouse function with a sliding‑window subsequence algorithm to compute conversion rates across steps.

Retention Analysis – daily user churn metrics across dimensions.

Path Analysis – session‑level user journey reconstruction using ClickHouse’s groupArray function.

Event Detection: Real‑time Kafka consumption provides immediate feedback on log format compliance, data volume, error rates, and detailed error information, supporting QA and PM validation of instrumentation.

Conclusion: The Compass platform now serves over ten business lines, ingesting more than 600 million events daily with sub‑second query latency.

Outlook: Future work includes simplifying data onboarding for new business units, optimizing storage and query performance at larger scales, extending advanced analytics such as user segmentation, and packaging ClickHouse as a shared service for broader enterprise use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Real-time analytics
Beike Product & Technology
Written by

Beike Product & Technology

As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.