Introduction to MySQL Cluster: Architecture, Scalability, and 200 Million QPS Benchmark
This article introduces MySQL Cluster, an in‑memory, real‑time, ACID‑compliant distributed database, explains its multi‑master architecture, data, application and management nodes, transparent partitioning, NoSQL APIs, and presents a benchmark achieving 200 million queries per second on up to 32 nodes.
The purpose of this article is to introduce MySQL Cluster, an in‑memory, real‑time, scalable and highly available version of MySQL, and to review its background and architecture to help understand how it can achieve the goal of handling 200 million queries per second.
MySQL Cluster Overview
MySQL Cluster is a distributed, multi‑master, ACID‑compliant transactional database that combines 99.999% availability with low total cost of ownership. It eliminates single points of failure, scales horizontally on commodity hardware, automatically partitions data for read‑ and write‑intensive workloads, and provides both SQL and NoSQL interfaces.
Originally designed as an embedded telecom database for carrier‑grade availability and real‑time performance, MySQL Cluster has evolved to support a wide range of use cases, including large‑scale OLTP, e‑commerce, inventory management, shopping carts, payment processing, order tracking, online gaming, financial transactions, fraud detection, mobile micro‑payments, session management, caching, data streaming, analytics, content delivery, communication services, and subscription/user configuration management.
MySQL Cluster Architecture Overview
The architecture consists of three node types that deliver services to applications. A simple example shows twelve Data Nodes divided into six node groups.
Data Nodes
Data Nodes are the primary storage nodes. They provide in‑memory and disk‑based storage, automatic table partitioning, data replication, transaction handling, automatic failover, and self‑healing synchronization after failures.
Tables are automatically partitioned across multiple Data Nodes, and each node acts as a write target, allowing write‑intensive workloads to be distributed across many commodity servers while remaining transparent to the application.
By using a shared‑nothing architecture with at least one replica, MySQL Cluster ensures that if a Data Node fails, another node with the same data takes over, rolling back and re‑executing any incomplete transactions.
Data can be stored entirely in memory or partially on disk (non‑index data). In‑memory storage benefits frequently changing (hot) data, with periodic checkpoints to disk and coordination across all Data Nodes for full system recovery. Disk‑based data is used for larger, less performance‑critical datasets, with page‑cache mechanisms to keep frequently accessed disk data in memory.
Application Nodes
Application Nodes provide the connection between application logic and Data Nodes. Applications can access the cluster via standard MySQL Server SQL interfaces or through the high‑performance NDB API (C++), which also enables NoSQL access. Supported APIs include Java, JPA, Memcached, JavaScript/Node.js, and HTTP/REST via an Apache module. All Application Nodes can read from any Data Node, ensuring no service loss during node failures.
Management Nodes
Management Nodes distribute configuration to all cluster nodes during startup, node joins, and reconfiguration. They can pause and restart nodes without affecting ongoing operations and provide arbitration services to resolve split‑brain or network partition scenarios.
Scalability Through Transparent Partitioning
Rows from any table are transparently split into multiple partitions, each hosted on a Data Node with a partner node that holds a replica. A two‑phase commit protocol ensures that committed transactions are stored on both nodes.
By default, the primary key serves as the partition key, hashed with MD5 to determine the target fragment. When a transaction or query spans multiple nodes, one node acts as the coordinator, distributes work, aggregates results, and returns them to the application. This multi‑node access capability distinguishes MySQL Cluster from typical NoSQL systems.
For linear scaling, high‑intensity queries/transactions should run on a single Data Node to minimize inter‑node latency. Administrators can define partitioning schemes on any column; for example, using a composite primary key of user ID and service name allows all rows for a given user to reside in the same fragment across tables, enabling queries to be processed on a single node.
Accelerating Data Access with NoSQL APIs
MySQL Cluster offers several access methods: the standard SQL interface and native APIs for C++, Java, JPA, JavaScript/Node.js, HTTP, and Memcached, allowing applications to bypass the MySQL Server layer for lower latency and greater flexibility.
Benchmark Goal: 200 Million Queries per Second
MySQL Cluster targets two workload types: OLTP (memory‑optimized tables delivering sub‑millisecond latency and extreme concurrency) and temporary search (parallel scans of non‑indexed columns). The flexAsynch benchmark is used to measure performance scaling as more Data Nodes are added.
The benchmark runs each Data Node on a 56‑core Intel E5‑2697 v3 (Haswell) server. Throughput scales almost linearly from 2 to 32 nodes (the current maximum is 48 nodes), reaching 200 million NoSQL queries per second at 32 nodes.
For more details, see the MySQL Cluster benchmark page and the MySQL Cluster 7.4 release information, as well as the recorded webinar.
English source: NetworkWorld
Translation source: 51CTO
Translation link: http://database.51cto.com/art/201505/476994_all.htm
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
