Big Data 11 min read

Why Suning.com Sticks with Hadoop: Insights into China’s Big Data Platform Choices

Amid declining Hadoop usage reports, Suning.com’s 2018‑2020 big‑data platform case study reveals why the retailer still relies on Hadoop’s mature ecosystem, how it integrates HDFS, HBase, YARN, Hive, Spark, Flink and emerging tools, and what future resource‑management plans it envisions.

ITPUB
ITPUB
ITPUB
Why Suning.com Sticks with Hadoop: Insights into China’s Big Data Platform Choices

Background

In 2018 the KDnuggets data‑science and machine‑learning tools survey reported a 35% drop in Hadoop usage among respondents, mainly from North America and Europe. The article examines whether this trend threatens Hadoop’s de‑facto status in China, where data volumes are larger and Hadoop adoption remains high.

2018 KDnuggets report showing Hadoop usage decline 35%
2018 KDnuggets report showing Hadoop usage decline 35%

Suning.com big‑data platform

Suning.com (a major B2C e‑commerce platform) built its data platform on Hadoop starting in 2013. The selection criteria were:

Maturity and stability: Hadoop had been production‑ready for years.

Cost‑effectiveness: Open‑source licensing eliminates software fees; community support (≈7.3 K GitHub stars) reduces maintenance effort.

Core Hadoop‑based architecture

The platform uses the following components, each with a specific role: HDFS – distributed file system for petabyte‑scale data storage. HBase – column‑family store providing real‑time read/write access to tables. YARN – unified resource manager that schedules both batch and streaming jobs. Hive / SparkSQL – primary engines for offline SQL analytics. MapReduce and Spark – supplemental compute for workloads that cannot be expressed in SQL. SparkStreaming – near‑real‑time processing (micro‑batch model). SparkMLlib – machine‑learning library that underpins Suning’s ML platform.

Limitations of the classic Hadoop stack

While Hadoop excels at massive storage and batch analytics, it is not optimized for:

sub‑second OLAP queries (requires specialized real‑time engines),

millisecond‑level streaming (micro‑batch model introduces latency).

No single platform currently satisfies both high‑throughput batch and ultra‑low‑latency workloads.

Component‑level competition

Suning observes intense competition among ecosystem components:

Spark – in‑memory compute, SparkSQL largely replaces MapReduce for most workloads.

Flink – native streaming framework with unbounded data handling, event‑time semantics, exactly‑once guarantees, and asynchronous checkpointing.

Containers – Docker Swarm and Kubernetes are emerging as alternatives to YARN/Mesos for resource orchestration.

Other storage/KV options – Redis and other in‑memory stores are used for specific caching scenarios, but HBase remains dominant for GB‑TB scale key‑value data.

Current strategic direction

Suning plans to retain Hadoop as the foundational layer while augmenting it with specialized tools:

Continue using Spark as the primary compute engine.

Store data on HDFS, object stores such as S3, or distributed object systems like Ceph.

Launch a unified resource‑management project in the second half of the year that will abstract batch, streaming, and container workloads (YARN, Mesos, Kubernetes). The project is expected to reduce machine‑hardware costs by roughly 30%.

Adopt Flink 1.5 (≈3.7 K GitHub stars) for native stream processing, aiming to replace the legacy Storm stack.

Introduce real‑time OLAP engines such as Druid and search‑oriented stores like Elasticsearch to cover scenarios where Hadoop alone is insufficient.

Flink adoption details

Flink 1.5 adds:

SQL and Table API enhancements for unified batch/stream programming.

Improved network stack for lower latency.

Full support for event‑time processing and exactly‑once semantics.

The release follows a roughly five‑month cadence and is backed by an active community (≈3.7 K stars on GitHub).

Interpretation of external reports

Gartner’s “Hadoop is dying” statement is viewed by Suning as a narrow focus on the original HDFS + MapReduce stack. While MapReduce usage declines in favor of Spark and Flink, the broader Hadoop ecosystem—storage, resource management, and mature components—remains essential for large‑scale data processing in China.

Conclusion

Suning does not intend a disruptive overhaul of its data platform. Hadoop will stay at the core, complemented by Spark, Flink, Druid, Elasticsearch, and a forthcoming unified resource‑management layer. This hybrid approach balances the proven stability of Hadoop with the performance advantages of newer compute and real‑time analytics frameworks.

Suning big‑data platform diagram
Suning big‑data platform diagram
Illustration of Hadoop component landscape
Illustration of Hadoop component landscape
FlinkData PlatformSparkHadoopSuning
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.