Why SQL Still Rules Big Data—and How NoSQL & NewSQL Fit In
The article explores the evolution of data processing from Hadoop and Spark to modern SQL, NoSQL, and NewSQL solutions, comparing their architectures, performance trade‑offs, and use‑cases, while illustrating concepts with examples like MapReduce, Hive, Impala, and streaming platforms such as Storm.
SQL, NoSQL, and NewSQL in the Big Data Era
With the rapid development of Hadoop and Spark, big‑data analysis platforms have become mainstream, offering strong performance, fault tolerance, and flexible scheduling.
SQL, as the de‑facto standard for databases, provides lower technical barriers, mature tooling, and easier migration compared with raw API programming, making it a primary choice for large‑scale data analysis.
MapReduce introduced a simple two‑stage model (Map and Reduce) for processing massive datasets, but its heavy I/O and rigid structure led to the emergence of second‑generation engines like Tez and Spark, which generalize and accelerate the Map/Reduce paradigm.
Higher‑level languages such as Pig and Hive translate script‑style or SQL queries into MapReduce jobs, allowing analysts—often without deep programming skills—to write concise queries; a word‑count example can be expressed in a few SQL lines instead of dozens of MapReduce statements.
Because Hive on MapReduce proved slow for interactive workloads, lighter engines like Impala, Presto, and Drill were created, focusing on aggressive resource allocation and SQL‑specific optimizations to achieve faster query response.
When even these engines cannot meet sub‑minute latency requirements, streaming computation (e.g., Storm) processes data in real time as it arrives, though it sacrifices flexibility for immediacy.
NoSQL emerged to address the scalability limits of traditional relational databases, offering key‑value stores, document databases, and graph databases that prioritize horizontal scaling, high availability, and eventual consistency (CAP and BASE principles).
NewSQL combines the scalability of NoSQL with the ACID guarantees and SQL interface of relational systems, providing high‑performance, distributed databases such as Clustrix, VoltDB, and cloud services like Amazon RDS and Azure SQL.
Product examples illustrate these concepts: Inceptor (a Spark‑based engine) supports both SQL and data‑mining models; Hyperbase and Stream target NoSQL and streaming use cases, together forming an “all‑in‑one” big‑data platform.
The big‑data ecosystem can be likened to a kitchen with various tools—each suited to different dishes—requiring the right combination of technologies for each workload.
Reference: 申德荣 et al., “支持大数据管理的NoSQL系统研究综述”, 软件学报, 2013.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
