Overview of Apache Cassandra: History, Features, and Use Cases
This article provides a comprehensive overview of Apache Cassandra, covering its origins at Facebook, key architectural features such as distributed design, elastic scalability, tunable consistency, and high availability, as well as typical deployment scenarios and notable companies that use it.
Why Apache Cassandra Was Created
In 2007 Facebook launched the Cassandra project to solve the Inbox Search problem, needing a system that could handle massive data volumes, random reads and writes, and high scalability for message replicas and reverse indexes. Led by Jeff Hammerbacher with engineers Avinash Lakshman, Karthik Ranganathan, and Prashant Malik, the code was open‑sourced on Google Code in July 2008, moved to the Apache incubator in March 2009, and became a top‑level Apache project in February 2010.
Name Origin
The name "Cassandra" comes from Greek mythology: the beautiful daughter of King Priam who could foresee the future but was cursed so no one would believe her predictions, mirroring the database’s ability to predict data needs while remaining untrusted until proven.
Apache Cassandra Features
Distributed and Decentralized
Cassandra runs on multiple machines as a single logical system, using a peer‑to‑peer protocol and gossip to track node health, eliminating any single point of failure and enabling multi‑data‑center deployments.
Elastic Scalability
Horizontal scaling is achieved by simply adding new nodes; Cassandra automatically discovers and integrates them without service interruption, allowing seamless growth or shrinkage of the cluster.
High Availability and Fault Tolerance
The system tolerates hardware or network failures, supports automatic failover, and can replicate data across data centers to maintain service continuity during disasters.
Tuneable Consistency
Clients can choose consistency levels and replication factors, balancing consistency against availability per the CAP theorem; lower consistency levels allow writes to succeed even when some replicas are down.
Row‑Oriented Storage
Although often described as column‑oriented, Cassandra stores data as a sparse multi‑dimensional hash table where each row has a unique key and flexible columns, enabling schema‑optional designs.
Flexible Schema
Early versions allowed schema‑free data; later versions introduced CQL (Cassandra Query Language) for defining schemas, supporting collections (list, set, map) and JSON storage while still permitting dynamic column addition.
High Performance
Cassandra is optimized for multi‑core processors and can sustain high write throughput across hundreds of nodes, scaling linearly without sacrificing latency.
Typical Application Scenarios
Large‑Scale Deployments
When an application requires dozens or hundreds of nodes for traffic and storage, Cassandra’s distributed architecture provides the necessary capacity and resilience.
Write‑Intensive, Analytics Workloads
Its high write throughput makes it ideal for user activity streams, social feeds, recommendation engines, and time‑series data.
Geographically Distributed Data
Cassandra can replicate data across multiple data centers, reducing latency for global users.
Evolving Applications
Start‑up or rapidly changing services benefit from its flexible schema, allowing on‑the‑fly addition or removal of columns without downtime.
Who Uses Cassandra
Major companies leveraging Cassandra include:
Apple – 75,000 nodes storing 10 PB of data
Netflix – 2,500 nodes storing 420 TB
Yisou – 270 nodes storing 300 TB
eBay – 100 nodes storing 250 TB
360 – 1,500 nodes
Ele.me – 100 nodes
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
