Big Data 26 min read

The Evolution of Data Platforms: From Early Computing to the Modern Big Data Stack

This article reviews the history of data platforms—from the first general‑purpose computers and early relational databases through traditional BI, agile BI, and big‑data technologies like Hadoop, Spark, and Flink, up to today’s cloud‑native modern data stack and its future outlook.

DataFunSummit
DataFunSummit
DataFunSummit
The Evolution of Data Platforms: From Early Computing to the Modern Big Data Stack

At the end of 2022 the author reflects on a challenging year and offers a technical summary and outlook of data platforms from a data‑technology perspective.

A data platform is defined as software that collects, stores, processes, computes, and analyzes data, provides security and SLA guarantees, and serves as the core for data‑driven decision making in enterprises.

Data Platform 1.0 – Traditional BI traces the origins from the ENIAC computer, the invention of relational databases (IBM IMS, Codd’s relational model, SQL), and the first generation of business‑intelligence tools such as Cognos, Hyperion, Business Objects, SAS, and SPSS, which were typically large‑scale, waterfall‑style projects with long implementation cycles.

Data Platform 2.0 – Agile BI and Big Data describes the shift in the 2000s to more agile, self‑service BI tools (Tableau, Sisense, Qlik) and the rise of big‑data technologies. It covers the emergence of distributed processing (HPCC, MapReduce), the open‑source Hadoop ecosystem (HDFS, MapReduce, YARN), and later fast‑execution engines like Spark and Flink, as well as Lambda and Kappa architectures for batch‑plus‑stream and pure‑stream processing.

Data Platform 3.0 – Modern Data Stack explains how public‑cloud services transformed data platform construction. Cloud data warehouses (Redshift, Snowflake, BigQuery) enabled ELT pipelines, while integration tools (Fivetran, Stitch, Airbyte), modeling tools (dbt, QuickTable), orchestration platforms (Airflow, Dagster, Prefect), BI/visualization products (Looker, Mode, ThoughtSpot), reverse‑ETL solutions (Census, Hightouch), and data‑governance systems (Atlan, Amundsen, Monte Carlo) together form a modular, SaaS‑first stack that democratizes data access.

The author concludes that data platforms have become the indispensable foundation for data‑driven enterprises, and predicts continued democratization and broader adoption across organizations of all sizes in the next decade.

References

https://en.wikipedia.org/wiki/Business_intelligence

https://www.clearpeaks.com/bi-project-management-part-1-introduction-and-different-strategic-approaches/

https://en.wikipedia.org/wiki/Data_warehouse

https://en.wikipedia.org/wiki/HPCC

https://data-flair.training/blogs/hadoop-architecture/

https://en.wikipedia.org/wiki/Apache_Spark

https://www.databricks.com/wp-content/uploads/2018/12/nsdi_spark.pdf

Author Bio

Yan Zhitiao, a Peking University graduate, co‑founder of Beijing Kuaiyong Cloud Technology Co., early team member of big‑data unicorn TalkingData (former R&D VP, CTO), and former senior architect at IBM and Oracle.

Big DataFlinkdata platformdata warehouseSparkHadoopmodern data stack
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.