Databases 14 min read

Cassandra: Past, Present, and Future – History, Architecture, Features, and Use Cases

This article summarizes a Cassandra meetup presentation that traces the database's origins from BigTable and Dynamo, outlines its key milestones, explains its peer‑to‑peer and LSM architecture, highlights current features, real‑world deployments, performance advantages, and previews upcoming 4.0 releases and community projects.

DataFunTalk
DataFunTalk
DataFunTalk
Cassandra: Past, Present, and Future – History, Architecture, Features, and Use Cases

The talk, delivered by Chen Jiang (Alibaba Cloud Senior Expert) at a Cassandra Meetup and organized by DataFunTalk, introduced the theme "Cassandra's Past, Present, and Future" and provided a comprehensive overview of the database.

Origin : Cassandra was inspired by Google’s BigTable and Amazon’s Dynamo. From BigTable it adopted the LSM‑based single‑node engine concepts such as Column Families, Memtables, and SSTables, while Dynamo contributed the distributed design, cluster management, and fault‑tolerance techniques.

Milestones :

July 2008 – Facebook released Cassandra (c*).

2009 – Became an Apache incubator project.

2010 – Graduated to a top‑level Apache project.

2011 – 1.0 released with leveled compaction.

2013 – Introduced CAS and triggers.

2015 – 3.0 released.

2019 – 4.0 released.

Cassandra milestones
Cassandra milestones

Database Ranking : According to DB‑Engines, Cassandra consistently ranks first among wide‑column NoSQL databases, far ahead of HBase, with a popularity score above 100 compared to HBase’s ~50.

DB‑Engines ranking
DB‑Engines ranking

Current Feature Overview :

Peer‑to‑peer nodes enable easy horizontal scaling.

LSM engine provides high‑throughput writes.

High availability and fault tolerance via replication.

Tunable consistency levels.

CQL query language and JDBC‑like drivers.

Elastic data storage and straightforward data distribution.

Cassandra features
Cassandra features

Consistency Hash & Gossip : Cassandra uses a single‑hash partitioning scheme to map keys to token ranges, eliminating a master node. Nodes exchange metadata via a peer‑to‑peer gossip protocol, achieving eventual consistency while keeping metadata lightweight.

Hash and gossip diagram
Hash and gossip diagram

LSM Engine Details : Writes are first recorded in a Write‑Ahead‑Log, then stored in a Memtable. When the Memtable grows, it is flushed to an SSTable. Compaction strategies include size‑tiered, leveled, and time‑windowed compactions.

LSM write path
LSM write path

Adoption : Major companies use Cassandra, including Facebook (original creator), Apple (100k+ nodes), 360, Ele.me, Reddit, Discord, and many others for large‑scale workloads.

Company usage
Company usage

Value Propositions :

Always‑online with multi‑master replication and tunable consistency.

Linear scalability simplifies operations; adding a node automatically balances data.

Multi‑DC deployment reduces latency and provides geographic disaster recovery.

Rich client drivers for Python, C++, Go, Node.js, PHP, etc.

Strong performance: lower latency and higher throughput than HBase in many benchmarks.

Performance comparison
Performance comparison

Typical Use Cases :

Risk‑control systems (user profiles, fraud detection, order data).

Personalized recommendation engines (behavior analysis, real‑time processing).

Big‑data pipelines.

Social feeds (e.g., Instagram, Weibo‑like timelines).

Time‑series and IoT data ingestion with massive concurrent writers.

Use case diagram
Use case diagram

Future Roadmap (Cassandra 4.0‑alpha) :

Fix incremental repair bugs; recommend full repair with caution.

Replace custom node‑to‑node communication with Netty for higher efficiency.

Add built‑in time functions and arithmetic operators.

Expose SASI indexes and materialized views as experimental features.

4.0 release highlights
4.0 release highlights

Community Projects (NGCC 2019) :

Pluggable storage engine supporting RocksDB to reduce JVM GC pressure.

Sidecar – a one‑stop operations platform for bootstrapping, data movement, configuration upgrades, monitoring, backup/restore, and repair.

ScyllaDB improvements for more efficient data repair.

Next‑generation compaction strategies beyond leveled compaction.

Community roadmap
Community roadmap

Rocksandra : An Instagram‑driven effort to combine Cassandra with RocksDB, delivering lower GC overhead, reduced tail latency, and higher throughput.

Rocksandra architecture
Rocksandra architecture

Sidecar Details :

Handles bootstrap and data migration.

Integrates common fault‑tolerance and operational commands.

Provides configuration upgrades, monitoring, metrics, and enterprise‑grade backup/restore dashboards.

Offers repair and optimization utilities.

Sidecar UI
Sidecar UI

Overall, Cassandra remains a leading wide‑column NoSQL solution, distinguished by its master‑less architecture, linear scalability, multi‑DC capabilities, extensive language drivers, and a vibrant community driving continuous innovation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataScalabilitydistributed databaseNoSQLLSMGossip Protocolcassandra
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.