Why the Big Data Era Is Over
The article argues that the era of big data is ending, showing that most organizations store only modest amounts of data, that storage costs outweigh benefits, and that modern cloud and analytics tools allow efficient processing without needing massive datasets.
As cloud computing matures, the notion of "big data" as a problem has faded; most real‑world workloads involve storing data rather than processing massive volumes, and the cost of keeping data often outweighs its value.
The author, a former founding engineer of Google BigQuery, reflects on a decade of advocating for big‑data solutions, noting that even his biggest customers typically store less than 1 TB, and many only a few hundred gigabytes.
He illustrates how the classic "big data" sales slide—warning that data growth will overwhelm existing systems—has become outdated, with traditional relational databases (SQLite, PostgreSQL, MySQL) gaining traction while NoSQL/NewSQL growth stalls.
Analysis of storage patterns shows that data size follows a power‑law distribution: a few customers have petabyte‑scale stores, but the median storage per customer is well under 100 GB, and most data is rarely accessed after a short period.
Modern cloud data platforms separate storage and compute, allowing independent scaling. However, storage growth often outpaces compute needs, leading to a bias toward larger storage footprints without proportional compute demand.
Typical analytical workloads query only a small fraction of the data; 90 % of expensive queries process less than 100 MB, and large‑scale scans are the exception rather than the rule.
Data that is older than a week sees dramatically reduced query frequency, with 99 % of accesses targeting just 30 % of the stored data, reinforcing that most stored data is effectively a liability.
By the classic definition—data that cannot be handled on a single machine—the set of workloads that truly require "big data" solutions is shrinking as hardware capabilities and cloud instances grow.
The author warns that retaining data simply because storage is cheap can create legal and operational liabilities, citing GDPR/CCPA compliance, potential litigation, and the challenges of maintaining legacy data schemas.
He concludes with a self‑assessment checklist to help readers determine whether they truly belong to the "top 1 %" of big‑data users, encouraging the adoption of modern tools that match actual data volumes rather than imagined future growth.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.