From MapReduce to Ray: The Evolution of Big Data Computing Engines and Career Opportunities
This article traces the history of big‑data computing engines—from early MapReduce and Hadoop through Spark, Storm, Flink, and the newer Ray—explaining their technical advances, real‑world applications in AI and finance, and why graduates should consider a career in this rapidly evolving field.
At the start of each recruitment season, many students wonder whether big‑data technologies, which have been popular for over a decade, still have a promising future; the article answers this by interviewing Ant Group’s senior technical expert Zhou Jiaying, who clarifies the evolution and prospects of core computing engines.
The piece explains that a computing engine is a program dedicated to data processing, originally integrated with storage in traditional databases, but later split into separate compute and storage engines to handle massive data volumes, ushering in the big‑data era.
It outlines the historical milestones: Google’s 2004 papers introducing GFS, MapReduce, and BigTable; Doug Cutting’s 2006 creation of Hadoop based on those ideas; the rise of Spark in 2012 as a faster alternative to MapReduce; the emergence of Storm for stream processing; and Flink’s 2014 entry as a unified batch‑and‑stream engine.
Ray, introduced in 2017 by UC Berkeley’s RISELab, is presented as a new distributed‑computing engine designed to meet the exploding compute demands of modern AI, offering a simple API that abstracts away the complexities of distributed systems.
Ray’s core concepts—Task, Object, and Actor—map directly to familiar programming constructs (functions, variables, classes), enabling developers to turn single‑machine programs into distributed applications with minimal changes.
Practical use cases at Ant Group are highlighted, such as fusing stream, graph, and machine‑learning workloads (e.g., real‑time fraud detection and Ant Forest interactions) and building end‑to‑end online learning pipelines that integrate data ingestion, feature engineering, model training, and deployment in real time.
The article also discusses the broader industry landscape: early adoption of Google’s ideas by Chinese companies (e.g., Alibaba’s TFS, JStorm, and the acquisition of Flink’s origin company), and Ant’s significant contributions to the Ray open‑source community.
Finally, it addresses career advice, encouraging graduates to master fundamental computer‑science topics, study the architectures and source code of major engines, and consider roles that involve distributed‑system development, SQL optimization, or scheduling, while also noting that the field remains challenging yet rewarding for those interested in low‑level system engineering.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.