Big Data 12 min read

How 58.com Built a Scalable Flink‑Based Real‑Time Data Platform (Wstream)

The article details how 58.com designed and evolved its one‑stop real‑time computation platform Wstream, migrating from Storm and Spark Streaming to Apache Flink, and describes the architecture, task isolation, stream‑SQL features, monitoring, and ongoing optimizations that enable processing of over 600 billion records daily.

dbaplus Community
dbaplus Community
dbaplus Community
How 58.com Built a Scalable Flink‑Based Real‑Time Data Platform (Wstream)

Background and Real‑Time Scenarios

58.com operates a massive lifestyle service platform covering recruitment, housing, automotive, finance, and local services. The diverse business lines generate huge volumes of user data that require low‑latency analysis. Real‑time needs include ETL from Kafka, real‑time data warehouses, monitoring, and personalized recommendation.

Platform Evolution

The platform progressed from Storm to Spark Streaming and finally to Apache Flink. Early engines could not meet the demand for sub‑second latency or high throughput, prompting a shift to Flink for its high‑throughput, low‑latency design, exactly‑once semantics, and native SQL support.

Scale and Stability

Today the platform runs on more than 500 machines, processing over 6000 billion events per day, with Flink handling roughly 50 % of the workload. Stability is ensured through task isolation and a high‑availability YARN‑deployed Flink cluster.

Task Isolation

Isolation is achieved at the business‑line level by assigning dedicated queues and machines, preventing a single noisy job from affecting the whole cluster.

Cluster Architecture

Flink runs in ON‑YARN mode with a dedicated HDFS namespace for checkpoints. Node labels and cgroups provide resource and CPU isolation, while RocksDB is used for state storage.

Wstream Platform Management

Wstream is a one‑stop, high‑performance real‑time big‑data platform built on Flink. It offers SQL‑based stream processing, DDL for sources/sinks and dimension tables, and supports UDF/UDAF/UDTF. Users can develop jobs via a CLI or an integrated web editor with syntax highlighting and validation.

Stream‑SQL Capability

Custom DDL syntax for source, sink, and dimension tables

Custom UDF/UDTF/UDAF support

Join between streams and dimension tables, including dual‑stream joins

The platform also provides a wizard‑driven configuration to hide complex parameters, allowing users to focus on business logic.

Storm‑to‑Flink Migration

A migration program moved existing Storm jobs to Flink, adding at‑least‑once guarantees, tick‑tuple compatibility, and YARN deployment for Storm tasks. The migration reduced resource consumption by over 40 % and eliminated the need for separate Storm clusters.

Task Diagnostics

Flink’s Web UI supplies runtime metrics, but historical data is captured via Prometheus and visualized in Grafana. Latency tracking uses the latencyTrackingInterval parameter, and task delay is measured by comparing Kafka topic log size with consumer offsets.

Log collection is centralized by rotating log4j files daily, cleaning expired logs, and forwarding logs from all nodes to Kafka, then to Elasticsearch for searchable access.

Flink Optimizations

Implemented a custom KafkaSpout to preserve at‑least‑once semantics during Storm‑to‑Flink migration.

Enhanced resource allocation by requesting slots from the SlotManager before launching new TaskManagers, aligning requested resources with actual consumption.

Extended the Kafka connector to support automatic line breaks and meaningful client IDs.

Modified Flink startup scripts to allow external JARs and configuration files, enabling third‑party dependencies.

Added LZO and Parquet support to BucketingSink for high‑performance HDFS writes.

Future Plans

The platform continues to complete Storm‑to‑Flink migration and will further expand Flink capabilities, targeting real‑time rule engines and online machine‑learning use cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkReal-time StreamingTask MigrationStream SQLWstream
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.