Why Big Data May Not Be the Gold Mine You Expect: Insights and Pitfalls
The article examines what big data really means, its core 4 V characteristics, current limitations in China, the overhyped value of data, the importance of business‑driven applications, and why starting from small, relevant data is essential for true predictive power.
Introduction
Many people talk about big data without really understanding what it is or how it relates to them, often driven by curiosity about new technology rather than practical experience.
1. What Is Big Data
The concept of the "big data era" was first popularized by McKinsey, which described data as a new production factor permeating every industry.
“Data has infiltrated every industry and business function, becoming an important production factor. Mining and using massive data heralds a new wave of productivity growth and consumer surplus.”
IBM summarized big data with four V’s:
Volume : massive scale, measured in petabytes (P), exabytes (E), or zettabytes (Z).
Variety : diverse types such as logs, video, images, and geolocation.
Value : low density but high commercial potential.
Velocity : fast processing speed, distinct from traditional data mining.
These V’s do not capture all characteristics; a diagram (see below) illustrates additional traits.
Victor Mayer‑Schönberger emphasized that in the big data era, we must use big‑data thinking to uncover its hidden value.
Examples of predictive use include Google’s flu‑trend forecasts, Amazon’s recommendation engine, and Farecast’s airline‑ticket timing predictions.
The core of big data is prediction. This leads to three mindset shifts:
From random samples to full‑population data.
From precision to handling mixed data.
From causal to correlational analysis.
2. Current Situation in China
According to the 2014 national economic census, there were 10.86 million legal entities in the secondary and tertiary sectors, representing 95.6% of all enterprises, with an average of 32.8 employees per unit.
Most Chinese companies are small‑to‑medium enterprises, so few possess massive data volumes. Website traffic data for typical Chinese firms (e.g., 用友, 东软, 绿盟) shows daily page views in the tens of thousands, yielding data at the gigabyte level—far short of terabyte or petabyte scales.
At this scale, a decent PC server can handle most workloads, often requiring only one or two machines for reliability.
Conclusion: The majority of Chinese companies do not have large amounts of data.
3. Core Value of Big Data
Although big‑data technologies (Hadoop, Spark, Storm, HBase, Hive) are widely discussed, the reality is that data often only validates the present and cannot reliably predict the future.
Big data told us that after a market crash there would be a rebound, but the market did not follow that expectation.
Any technology that does not start from solving a business problem is merely gimmickry. Purely learning big‑data tools without a concrete use case offers little value; knowledge fades if not applied.
4. Is Data Really Valuable?
Data is often cheap, especially when easily scraped from the internet. Web crawlers can be built quickly with Python or off‑the‑shelf tools, lowering the barrier to collection.
Because data is highly reproducible, especially unstructured data, its raw cost is low; the real value lies in how the data is utilized.
For decision‑makers, the key questions are: Is there a problem? What is it? Are there new insights? What actions are needed?
5. The Big Data Bubble
Professor Michael Jordan warns that current big‑data results lack reliability; applying them prematurely is like building a bridge without civil‑engineering knowledge, leading to “tofu‑crack” structures.
Issues include a high rate of false positives, limited progress in computer vision, and fundamental differences between artificial neural networks and the human brain.
Media analogies create misunderstandings and hype.
Big data is not yet a rigorous science; misuse can cause disastrous outcomes.
Premature enthusiasm may quickly turn to disillusionment if short‑term results are lacking.
6. Start From Small Data
“Small data” refers to individualized, digitized information—such as personal health observations—that, while not massive, is highly relevant to the individual.
Many enterprises struggle not with big data, but with effectively using their existing small data before scaling up.
Big data should evolve naturally from small data, forming an ecosystem rather than a sudden leap.
Steps to grow data value:
Identify core business data specific to the enterprise.
Gradually incorporate related internal data (second layer).
Add structured external data (third layer).
Integrate social and unstructured data (fourth layer).
By layering data thoughtfully, organizations can unlock actionable insights.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.