Databases 22 min read

Mastering Cluster Terminology and Database Cluster Architectures

This article explains core cluster concepts, the benefits of building database clusters, classifies common cluster types, and compares scalable architectures such as Oracle RAC, MySQL Cluster, sharding, CAP/BASE theory, and cross‑database transaction strategies for high availability and performance.

IT Architects Alliance

Jun 27, 2022

Cluster Terminology

Service hardware refers to the physical machines that provide compute services, such as PCs or servers.

Service entity usually denotes the combination of service software and hardware.

Node is an independent host running the Heartbeat process; each node runs an operating system and Heartbeat services and forms the core of a high‑availability (HA) cluster.

Resource is an entity that a node can control, such as disk partitions, file systems, IP addresses, application services, or shared storage, which can be taken over by other nodes when a failure occurs.

Event denotes any situation that may happen in a cluster—node OS failure, network outage, NIC failure, application failure, etc.—that triggers resource migration.

What Is a Cluster?

A cluster is a group of computers that work together as a single system to provide network resources to users; each computer is a node. Clusters deliver four key properties:

Scalability – performance is not limited to a single service entity; new nodes can be added dynamically.

High availability – redundant service entities keep clients from seeing "out of service" warnings; if one node fails, its applications are automatically taken over by another node.

Load balancing – tasks are evenly distributed across compute and network resources, improving throughput.

Error recovery – when a node becomes unavailable, its resources and applications transparently migrate to a healthy node.

Differences between distributed systems and clusters:

Distributed systems spread different workloads across different locations.

Clusters concentrate several servers to serve the same workload.

Every distributed node can act as a cluster, but a cluster is not necessarily distributed; clusters have tighter organization and can tolerate single‑node failures.

Clusters are typically classified into three categories: HA (High‑Availability Cluster), LBC (Load‑Balance Cluster), and HPC (High‑Performance Computing Cluster).

Why Build a Database Cluster?

Rapid business growth leads to explosive increases in user count and data volume, challenging database performance, availability, and security. Building a database cluster addresses four main concerns:

Scalability – as traffic grows, a single server cannot handle the load; clustering allows horizontal scaling by adding more servers.

Cost efficiency – instead of repeatedly upgrading hardware, a few modest servers can be combined to achieve load balancing and continuous expansion.

High availability – if a node fails, the cluster automatically detects the failure and transfers the workload, ensuring uninterrupted service.

Data safety – clustering provides redundancy and backup, protecting critical business data from loss.

Database Cluster Classifications

Database clustering technologies unite multiple servers to achieve better overall performance and lower investment costs. They fall into two families:

Engine‑based clustering (e.g., Oracle RAC, Microsoft MSCS, IBM DB2UDB, Sybase ASE).

Middleware‑based clustering (e.g., ICX‑UDS).

Cluster software typically targets three problem areas:

Load‑balance clusters (LBC) – focus on horizontal scaling and performance.

High‑availability clusters (HAC) – ensure continuous application availability.

High‑security clusters (HSC) – address disaster recovery.

Only Oracle RAC claims to cover all three aspects.

Scalable Distributed Database Architectures

1. Oracle RAC

Oracle Real Application Clusters use a shared‑storage architecture where all nodes access a common storage device over a high‑speed network. RAC offers load balancing and Transparent Application Failover (TAF) without requiring application changes. However, scalability is limited by the shared storage I/O capacity, making it more suitable for scale‑up scenarios. Oracle’s MAA (Maximum Availability Architecture) introduces ASM to aggregate multiple storage devices, enabling linear storage scaling.

As node count grows, inter‑node communication overhead can become a bottleneck, especially for OLTP workloads. Recommendations:

Use high‑speed interconnects between nodes.

Distribute different applications across separate nodes to reduce contention.

RAC performs well in Decision Support System (DSS) environments where workloads can be spread across nodes, while for OLTP it is primarily used for availability rather than scaling.

2. MySQL Cluster

MySQL Cluster follows a shared‑nothing architecture composed of management nodes (ndb_mgmd), data nodes (ndbd), and SQL nodes (mysqld). Data is stored in memory via the NDB engine, partitioned across data nodes, with optional replication for redundancy. Advantages include linear scalability, no single point of failure, and high availability, making it suitable for OLTP.

Limitations:

The NDB engine requires all data (or at least indexes) to reside in memory, though newer versions allow disk‑based storage for non‑indexed data.

Queries not based on primary keys may need to scan all data nodes, and write operations must replicate data to multiple nodes, demanding high network bandwidth.

3. Distributed Database Architecture (Sharding)

Sharding (horizontal partitioning) splits data across multiple nodes to overcome single‑node I/O limits. It enables near‑linear scaling and high availability, but may introduce complexity for transactional workloads. Read‑write separation is a common pattern: a master handles writes, while multiple slaves serve reads, improving read throughput and fault tolerance.

Read‑write separation can be implemented with MySQL Replication, Oracle Active Standby, or other log‑based replication technologies. However, both master and slave must store complete data sets, so very large datasets can still be constrained by individual node storage.

4. CAP and BASE Theories

The CAP theorem states that a distributed system can simultaneously guarantee only two of Consistency, Availability, and Partition tolerance. Architects must make trade‑offs based on application needs.

Relational databases follow the ACID model (Atomicity, Consistency, Isolation, Durability), offering strong consistency but limited partition tolerance. NoSQL systems adopt the BASE model (Basically Available, Soft state, Eventual consistency) to achieve higher availability and scalability at the cost of immediate consistency.

5. Cross‑Database Transactions

Two‑Phase Commit (2PC) is considered an anti‑scalability pattern for relational databases. Modern distributed systems often rely on sharding and read‑write separation rather than global transactions to maintain performance and availability.

Overall, choosing a clustering solution depends on workload characteristics (DSS vs. OLTP), scalability requirements, budget, and operational expertise.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scalability CAP theorem sharding High Availability load balancing Database Cluster MySQL Cluster Oracle RAC

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.