Big Data 16 min read

From MapReduce to Ray: The Evolution of Big Data Computing Engines and Career Opportunities

This article traces the history of big‑data computing engines—from early MapReduce and Hadoop through Spark, Storm, Flink, and the newer Ray—explaining their technical advances, real‑world applications in AI and finance, and why graduates should consider a career in this rapidly evolving field.

AntTech

Mar 23, 2021

From MapReduce to Ray: The Evolution of Big Data Computing Engines and Career Opportunities

At the start of each recruitment season, many students wonder whether big‑data technologies, which have been popular for over a decade, still have a promising future; the article answers this by interviewing Ant Group’s senior technical expert Zhou Jiaying, who clarifies the evolution and prospects of core computing engines.

The piece explains that a computing engine is a program dedicated to data processing, originally integrated with storage in traditional databases, but later split into separate compute and storage engines to handle massive data volumes, ushering in the big‑data era.

It outlines the historical milestones: Google’s 2004 papers introducing GFS, MapReduce, and BigTable; Doug Cutting’s 2006 creation of Hadoop based on those ideas; the rise of Spark in 2012 as a faster alternative to MapReduce; the emergence of Storm for stream processing; and Flink’s 2014 entry as a unified batch‑and‑stream engine.

Ray, introduced in 2017 by UC Berkeley’s RISELab, is presented as a new distributed‑computing engine designed to meet the exploding compute demands of modern AI, offering a simple API that abstracts away the complexities of distributed systems.

Ray’s core concepts—Task, Object, and Actor—map directly to familiar programming constructs (functions, variables, classes), enabling developers to turn single‑machine programs into distributed applications with minimal changes.

Practical use cases at Ant Group are highlighted, such as fusing stream, graph, and machine‑learning workloads (e.g., real‑time fraud detection and Ant Forest interactions) and building end‑to‑end online learning pipelines that integrate data ingestion, feature engineering, model training, and deployment in real time.

The article also discusses the broader industry landscape: early adoption of Google’s ideas by Chinese companies (e.g., Alibaba’s TFS, JStorm, and the acquisition of Flink’s origin company), and Ant’s significant contributions to the Ray open‑source community.

Finally, it addresses career advice, encouraging graduates to master fundamental computer‑science topics, study the architectures and source code of major engines, and consider roles that involve distributed‑system development, SQL optimization, or scheduling, while also noting that the field remains challenging yet rewarding for those interested in low‑level system engineering.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data AI career distributed computing Ray computing engines

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.