How 58.com Built a Scalable Flink‑Based Real‑Time Data Platform (Wstream)
The article details how 58.com designed and evolved its one‑stop real‑time computation platform Wstream, migrating from Storm and Spark Streaming to Apache Flink, and describes the architecture, task isolation, stream‑SQL features, monitoring, and ongoing optimizations that enable processing of over 600 billion records daily.
Background and Real‑Time Scenarios
58.com operates a massive lifestyle service platform covering recruitment, housing, automotive, finance, and local services. The diverse business lines generate huge volumes of user data that require low‑latency analysis. Real‑time needs include ETL from Kafka, real‑time data warehouses, monitoring, and personalized recommendation.
Platform Evolution
The platform progressed from Storm to Spark Streaming and finally to Apache Flink. Early engines could not meet the demand for sub‑second latency or high throughput, prompting a shift to Flink for its high‑throughput, low‑latency design, exactly‑once semantics, and native SQL support.
Scale and Stability
Today the platform runs on more than 500 machines, processing over 6000 billion events per day, with Flink handling roughly 50 % of the workload. Stability is ensured through task isolation and a high‑availability YARN‑deployed Flink cluster.
Task Isolation
Isolation is achieved at the business‑line level by assigning dedicated queues and machines, preventing a single noisy job from affecting the whole cluster.
Cluster Architecture
Flink runs in ON‑YARN mode with a dedicated HDFS namespace for checkpoints. Node labels and cgroups provide resource and CPU isolation, while RocksDB is used for state storage.
Wstream Platform Management
Wstream is a one‑stop, high‑performance real‑time big‑data platform built on Flink. It offers SQL‑based stream processing, DDL for sources/sinks and dimension tables, and supports UDF/UDAF/UDTF. Users can develop jobs via a CLI or an integrated web editor with syntax highlighting and validation.
Stream‑SQL Capability
Custom DDL syntax for source, sink, and dimension tables
Custom UDF/UDTF/UDAF support
Join between streams and dimension tables, including dual‑stream joins
The platform also provides a wizard‑driven configuration to hide complex parameters, allowing users to focus on business logic.
Storm‑to‑Flink Migration
A migration program moved existing Storm jobs to Flink, adding at‑least‑once guarantees, tick‑tuple compatibility, and YARN deployment for Storm tasks. The migration reduced resource consumption by over 40 % and eliminated the need for separate Storm clusters.
Task Diagnostics
Flink’s Web UI supplies runtime metrics, but historical data is captured via Prometheus and visualized in Grafana. Latency tracking uses the latencyTrackingInterval parameter, and task delay is measured by comparing Kafka topic log size with consumer offsets.
Log collection is centralized by rotating log4j files daily, cleaning expired logs, and forwarding logs from all nodes to Kafka, then to Elasticsearch for searchable access.
Flink Optimizations
Implemented a custom KafkaSpout to preserve at‑least‑once semantics during Storm‑to‑Flink migration.
Enhanced resource allocation by requesting slots from the SlotManager before launching new TaskManagers, aligning requested resources with actual consumption.
Extended the Kafka connector to support automatic line breaks and meaningful client IDs.
Modified Flink startup scripts to allow external JARs and configuration files, enabling third‑party dependencies.
Added LZO and Parquet support to BucketingSink for high‑performance HDFS writes.
Future Plans
The platform continues to complete Storm‑to‑Flink migration and will further expand Flink capabilities, targeting real‑time rule engines and online machine‑learning use cases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
