Databases 14 min read

Overview of Apache Cassandra: History, Features, and Use Cases

This article provides a comprehensive overview of Apache Cassandra, covering its origins at Facebook, key architectural features such as distributed design, elastic scalability, tunable consistency, and high availability, as well as typical deployment scenarios and notable companies that use it.

Big Data Technology & Architecture

Sep 24, 2019

Overview of Apache Cassandra: History, Features, and Use Cases

Why Apache Cassandra Was Created

In 2007 Facebook launched the Cassandra project to solve the Inbox Search problem, needing a system that could handle massive data volumes, random reads and writes, and high scalability for message replicas and reverse indexes. Led by Jeff Hammerbacher with engineers Avinash Lakshman, Karthik Ranganathan, and Prashant Malik, the code was open‑sourced on Google Code in July 2008, moved to the Apache incubator in March 2009, and became a top‑level Apache project in February 2010.

Name Origin

The name "Cassandra" comes from Greek mythology: the beautiful daughter of King Priam who could foresee the future but was cursed so no one would believe her predictions, mirroring the database’s ability to predict data needs while remaining untrusted until proven.

Apache Cassandra Features

Distributed and Decentralized

Cassandra runs on multiple machines as a single logical system, using a peer‑to‑peer protocol and gossip to track node health, eliminating any single point of failure and enabling multi‑data‑center deployments.

Elastic Scalability

Horizontal scaling is achieved by simply adding new nodes; Cassandra automatically discovers and integrates them without service interruption, allowing seamless growth or shrinkage of the cluster.

High Availability and Fault Tolerance

The system tolerates hardware or network failures, supports automatic failover, and can replicate data across data centers to maintain service continuity during disasters.

Tuneable Consistency

Clients can choose consistency levels and replication factors, balancing consistency against availability per the CAP theorem; lower consistency levels allow writes to succeed even when some replicas are down.

Row‑Oriented Storage

Although often described as column‑oriented, Cassandra stores data as a sparse multi‑dimensional hash table where each row has a unique key and flexible columns, enabling schema‑optional designs.

Flexible Schema

Early versions allowed schema‑free data; later versions introduced CQL (Cassandra Query Language) for defining schemas, supporting collections (list, set, map) and JSON storage while still permitting dynamic column addition.

High Performance

Cassandra is optimized for multi‑core processors and can sustain high write throughput across hundreds of nodes, scaling linearly without sacrificing latency.

Typical Application Scenarios

Large‑Scale Deployments

When an application requires dozens or hundreds of nodes for traffic and storage, Cassandra’s distributed architecture provides the necessary capacity and resilience.

Write‑Intensive, Analytics Workloads

Its high write throughput makes it ideal for user activity streams, social feeds, recommendation engines, and time‑series data.

Geographically Distributed Data

Cassandra can replicate data across multiple data centers, reducing latency for global users.

Evolving Applications

Start‑up or rapidly changing services benefit from its flexible schema, allowing on‑the‑fly addition or removal of columns without downtime.

Who Uses Cassandra

Major companies leveraging Cassandra include:

Apple – 75,000 nodes storing 10 PB of data

Netflix – 2,500 nodes storing 420 TB

Yisou – 270 nodes storing 300 TB

eBay – 100 nodes storing 250 TB

360 – 1,500 nodes

Ele.me – 100 nodes

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High Availability data modeling distributed database NoSQL Apache Cassandra

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.