Big Data 11 min read

Evolution of 58.com Real-Time Computing Platform and the One‑Stop Streaming Platform Wstream Built on Flink

This article details the technical evolution of 58.com’s real‑time computing platform, describing the shift from Storm and Spark Streaming to Apache Flink, the design of the one‑stop Wstream platform, its large‑scale deployment, stability measures, SQL streaming capabilities, task migration, diagnostics, optimizations, and future plans.

58 Tech

Nov 20, 2019

Evolution of 58.com Real-Time Computing Platform and the One‑Stop Streaming Platform Wstream Built on Flink

The article introduces the evolution of 58.com’s real‑time computing platform and the one‑stop streaming platform Wstream built on Apache Flink, sharing practical experience, insights, and methodology.

58.com operates a wide range of services (recruitment, real estate, automotive, finance, etc.) and generates massive user data daily, requiring efficient, stable, distributed real‑time computation to support analytics and decision‑making.

Key real‑time scenarios include: (1) real‑time ETL from Kafka, (2) real‑time data warehousing for metric calculation, (3) real‑time monitoring of system and user behavior, and (4) real‑time analysis such as feature platforms, user profiling, and personalized recommendation.

The platform’s technical stack evolved from Storm to Spark Streaming and finally to Flink, driven by the need for higher throughput, lower latency, state management, flexible windows, and exactly‑once guarantees.

Currently the platform runs on over 500 machines, processing more than 6,000 billion records per day, with Flink handling about 50 % of the workload after a year of development.

To ensure Flink cluster stability, the system employs task isolation, high‑availability architecture, YARN deployment, HDFS federation for checkpoint storage, node‑label based resource isolation, and cgroup CPU isolation.

Wstream provides a SQL‑based streaming analysis capability, offering DDL for sources/sinks and dimension tables, support for UDF/UDAF/UDTF, and multiple execution modes (FlinkJar, Stream SQL, Flink‑Storm) along with debugging, monitoring, and lifecycle management tools.

Stream SQL extensions include custom DDL syntax, custom function definitions, and support for stream‑dimension table joins and dual‑stream joins.

The team also migrated existing Storm tasks to Flink using the Flink‑Storm compatibility layer, adding YARN support, at‑least‑once semantics, and tick‑tuple handling, achieving over 40 % resource savings and simplifying operations.

Task diagnostics are enhanced with Prometheus‑based metrics collection, Grafana visualization, latency tracking (using Kafka offset vs. logsize), and centralized log aggregation to Elasticsearch via Kafka agents.

Additional Flink optimizations cover: implementing checkpoint‑based ack for Storm‑to‑Flink migration, slot‑aware resource requests to avoid over‑provisioning, Kafka connector enhancements, support for external JARs and configuration files, and extending BucketingSink to LZO and Parquet formats.

Future work focuses on completing Storm‑to‑Flink migration, further improving Flink capabilities, and expanding into real‑time rule engines and streaming machine‑learning applications.

Author: Feng Haitao / Wan Shikang, responsible for the construction of 58.com’s real‑time computing platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink Real-time Streaming Task Migration streaming SQL Wstream

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.