The Evolution of Data Platforms: From Early Computing to the Modern Big Data Stack
This article reviews the history of data platforms—from the first general‑purpose computers and early relational databases through traditional BI, agile BI, and big‑data technologies like Hadoop, Spark, and Flink, up to today’s cloud‑native modern data stack and its future outlook.
At the end of 2022 the author reflects on a challenging year and offers a technical summary and outlook of data platforms from a data‑technology perspective.
A data platform is defined as software that collects, stores, processes, computes, and analyzes data, provides security and SLA guarantees, and serves as the core for data‑driven decision making in enterprises.
Data Platform 1.0 – Traditional BI traces the origins from the ENIAC computer, the invention of relational databases (IBM IMS, Codd’s relational model, SQL), and the first generation of business‑intelligence tools such as Cognos, Hyperion, Business Objects, SAS, and SPSS, which were typically large‑scale, waterfall‑style projects with long implementation cycles.
Data Platform 2.0 – Agile BI and Big Data describes the shift in the 2000s to more agile, self‑service BI tools (Tableau, Sisense, Qlik) and the rise of big‑data technologies. It covers the emergence of distributed processing (HPCC, MapReduce), the open‑source Hadoop ecosystem (HDFS, MapReduce, YARN), and later fast‑execution engines like Spark and Flink, as well as Lambda and Kappa architectures for batch‑plus‑stream and pure‑stream processing.
Data Platform 3.0 – Modern Data Stack explains how public‑cloud services transformed data platform construction. Cloud data warehouses (Redshift, Snowflake, BigQuery) enabled ELT pipelines, while integration tools (Fivetran, Stitch, Airbyte), modeling tools (dbt, QuickTable), orchestration platforms (Airflow, Dagster, Prefect), BI/visualization products (Looker, Mode, ThoughtSpot), reverse‑ETL solutions (Census, Hightouch), and data‑governance systems (Atlan, Amundsen, Monte Carlo) together form a modular, SaaS‑first stack that democratizes data access.
The author concludes that data platforms have become the indispensable foundation for data‑driven enterprises, and predicts continued democratization and broader adoption across organizations of all sizes in the next decade.
References
https://en.wikipedia.org/wiki/Business_intelligence
https://www.clearpeaks.com/bi-project-management-part-1-introduction-and-different-strategic-approaches/
https://en.wikipedia.org/wiki/Data_warehouse
https://en.wikipedia.org/wiki/HPCC
https://data-flair.training/blogs/hadoop-architecture/
https://en.wikipedia.org/wiki/Apache_Spark
https://www.databricks.com/wp-content/uploads/2018/12/nsdi_spark.pdf
Author Bio
Yan Zhitiao, a Peking University graduate, co‑founder of Beijing Kuaiyong Cloud Technology Co., early team member of big‑data unicorn TalkingData (former R&D VP, CTO), and former senior architect at IBM and Oracle.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.