Boost Ceph Performance: Mastering Cache Pool Configuration and Management
This guide explains how Ceph cache pools work, compares read‑only and write‑back cache types, and provides step‑by‑step commands for creating, configuring, and safely removing cache pools, including CRUSH rule adjustments to ensure data lands on SSDs.
In the Ceph distributed storage system, a cache pool is a special storage pool used to improve data access speed by caching hot data on faster devices such as SSDs.
1. How cache pools work
A cache pool creates a logical layer that moves frequently accessed data from slower media (e.g., HDD) to faster media (e.g., SSD). When a client requests data, the system first checks the cache pool; a hit returns the data immediately, otherwise the data is fetched from the backend pool and written into the cache for future accesses.
Due to long‑term lack of maintenance, cache tiering has been deprecated in the Reef release.
2. Types of cache pools
Read Cache
Features:
Cache type: Read‑only cache accelerates read operations by storing read data in the cache pool.
Data consistency: All writes go directly to the primary pool; the cache does not affect consistency, which is guaranteed by the primary pool.
Use cases: Suitable for read‑heavy, write‑light scenarios such as video on demand or static content distribution.
Operation:
When a client issues a read request, the cache pool is checked first.
Write‑back Cache
Features:
Cache type: Write‑back cache speeds up write operations by first writing to the cache pool, then asynchronously flushing to the primary pool.
Data consistency: Because writes initially reside in the cache, the cache may temporarily diverge from the primary pool; eventual write‑back ensures consistency.
Use cases: Ideal for write‑heavy, read‑light workloads such as log ingestion or high‑throughput databases.
Operation:
When a client issues a write request, the data is written to the cache pool and the client receives an immediate success response.
The cache data is flushed to the primary pool asynchronously in the background.
Read requests can still retrieve the latest data from the cache because it has been updated.
3. Configuring a cache pool
Creating and linking a cache pool to a backend pool involves the following steps:
1. Create the cache pool
<code>ceph osd pool create my_cache_pool 128</code>2. Add the cache pool as a tier to the base pool and set the cache mode to write‑back
<code>ceph osd tier add libvirt-pool cache_pool
ceph osd tier cache-mode cache_pool writeback</code>3. Associate the cache pool with the backend pool
This command redirects client I/O to the cache pool.
<code>ceph osd tier set-overlay libvirt-pool cache_pool</code>4. (Optional) Enable additional cache parameters
<code>ceph osd pool set cache_pool hit_set_type bloom
ceph osd pool set cache_pool hit_set_count 1
ceph osd pool set cache_pool hit_set_period 3600
ceph osd pool set cache_pool target_max_bytes 10737418240
ceph osd pool set cache_pool target_max_objects 10000
ceph osd pool set cache_pool min_read_recency_for_promote 1
ceph osd pool set cache_pool min_write_recency_for_promote 1
ceph osd pool set cache_pool min_read_recency 1
ceph osd pool set cache_pool min_write_recency 1
ceph osd pool set cache_pool target_dirty_ratio 0.4
ceph osd pool set cache_pool cache_target_dirty_high_ratio 0.6
ceph osd pool set cache_pool cache_target_full_ratio 0.8</code>Configuring CRUSH class
After creating a cache pool, data is not automatically placed on SSD OSDs; you must modify the CRUSH map to direct the cache pool to SSD devices.
Ensure a CRUSH rule exists that stores the cache pool on SSDs, or create a new one.
View existing CRUSH rules
<code>ceph osd crush rule dump</code>Create a new CRUSH rule (assuming SSD devices are already labeled)
<code>ceph osd getcrushmap -o crushmapdump
crushtool -d crushmapdump -o crushmapdump-decompiled</code>Edit the decompiled map to change the class of OSDs 20‑24 to
ssdand add rules for HDD and SSD.
<code>vim crushmapdump-decompiled
# ... modify device lines ...
rule replicated_hdd {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
rule replicated_ssd {
id 2
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
</code>Or create the rules directly:
<code>ceph osd crush rule create-replicated replicated_ssd default host ssd
ceph osd crush rule create-replicated replicated_hdd default host hdd</code>Bind the cache pool to the new CRUSH rule
<code># ceph osd pool set cache_pool crush_rule replicated_ssd
# ceph osd pool set libvirt-pool crush_rule replicated_hdd</code>Verify that the cache pool and the data pool use different rules, confirming that cache data resides on SSDs.
Deleting a cache pool
The removal process differs for read‑only and write‑back caches.
Delete a read‑only cache
Since no data is modified, you can disable and delete the cache without data loss.
<code>ceph osd tier cache-mode cache_pool none</code>Unlink the cache from the data pool:
<code>ceph osd tier remove libvirt-pool cache_pool</code>Delete a write‑back cache
Because data may have been modified, first change the cache mode to
proxyso new and modified objects are flushed to the backend pool.
<code>ceph osd tier cache-mode cache_pool proxy</code>Ensure all objects are flushed:
<code>rados -p cache_pool ls</code>If objects remain, flush them manually:
<code>rados -p cache_pool cache-flush-evict-all</code>Remove the overlay so clients no longer route traffic to the cache pool:
<code>ceph osd tier remove-overlay libvirt-pool</code>Unlink the cache pool from the data pool and delete it:
<code>ceph osd tier remove libvirt-pool cache_pool
ceph osd pool delete cache-pool cache_pool --yes-i-really-really-mean-it</code>Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.