Big Data 14 min read

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

This article details Beike's real‑time computing engine, covering its background, streaming platform built on Spark Streaming and Flink, data ingestion via Kafka, metadata handling, SQL‑based task development, monitoring, storage solutions, and future roadmap for resource management and AI‑enhanced monitoring.

DataFunTalk
DataFunTalk
DataFunTalk
Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

Beike, a technology‑driven housing service platform, operates a large‑scale data architecture team responsible for storage, compute, and real‑time data stream platforms, supporting over a thousand product engineers and handling massive log and event data streams.

The streaming platform primarily uses Spark Streaming and Flink to provide a unified service for business teams, avoiding the overhead of each team managing its own client and job lifecycle. It runs on YARN for resource scheduling and leverages community‑edition Flink with extended SQL capabilities, offering templates for common real‑time processing tasks.

Data sources include Kafka streams, MySQL/TiDB binlogs, and front‑end telemetry (named "Dig"). Ingestion is managed through a private cloud where users submit collection requests, which are approved by operations and automatically configured via rsyslog to publish logs to Kafka topics.

Metadata extraction parses incoming JSON or tab‑separated data to generate schemas and DDL automatically, simplifying task creation. Users can develop tasks either by submitting compiled JARs or by writing SQL in the platform's web IDE, which also supports UDF registration and dimension table joins using TiDB or HBase.

Task management supports Spark 2.3, Flink 1.8/1.9, with per‑job and session modes, resource tuning, and comprehensive monitoring of system and job metrics. Custom reporters push metrics to Kafka, then to Elasticsearch and Druid for visualization in Grafana, enabling alerts for latency, failures, and resource usage.

Storage layers include hot and warm Elasticsearch nodes for recent logs, with older data offloaded to HDFS and re‑indexed into Druid or Elasticsearch as needed. The platform also integrates with Gobblin for offline backups.

Use cases span real‑time BI dashboards, log analysis, service stability monitoring, and AIOps. The FAST monitoring platform provides alerting via multiple channels (WeChat, SMS, email, callbacks) and supports anomaly detection with Flink CEP.

Future plans focus on enhancing the SQL parser and debugging tools, implementing dynamic resource allocation based on actual utilization, improving real‑time task diagnostics, and advancing AI‑driven monitoring for traffic prediction and root‑cause analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringBig Datadata pipelineFlinkStreamingKafkaReal‑Time ComputingSpark
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.