How InfiniFS Optimizes Metadata Access with Optimistic Cache and Lazy Invalidation
This article explains InfiniFS's cache organization for directory metadata, its optimistic cache usage, and the lazy invalidation mechanism that broadcasts rename updates to a few metadata servers, enabling scalable and efficient metadata services in large‑scale distributed file systems.
This article is the third part of the InfiniFS paper study, focusing on the optimistic metadata cache technique used in the system architecture.
1. How to Organize Cache? (Cache Organization)
InfiniFS caches only directory access metadata (name, ID, and permissions) on the client side. A cache hit eliminates queries to near‑root directories, avoiding hotspot issues and ensuring scalable path resolution.
The cache entries are organized as a tree based on the filesystem hierarchy, with leaf nodes linked in Least‑Recently‑Used (LRU) order. When replacement occurs, the LRU leaf node is evicted, keeping directories close to the root always cached.
2. Lazy Invalidation
After directory rename or permission changes, many cache entries become stale. Invalidating these entries on every client is impractical because the number of clients far exceeds the number of metadata servers, and client membership is hard to manage.
The lazy invalidation mechanism broadcasts invalidation information to the metadata servers (which are far fewer than clients). Each server lazily validates the cache when handling a client request.
Specifically, a rename operation first contacts a "rename coordinator" to prevent orphan loops, then broadcasts the rename information to all metadata servers. Since rename operations are rare (about 0.0083% in measurements), a single coordinator suffices.
The rename handling flow is:
Send the rename request to the rename coordinator, which checks for orphan loops.
Lock the target directory to serialize the rename with other operations, broadcast the rename information and its version to all metadata servers, and wait for acknowledgments. Servers store the rename info in an "invalidation list" sorted by version.
InfiniFS migrates the directory’s access metadata from the source server to the target server and updates the rename‑list (RL) and back‑pointer (BP) on the first rename.
Server‑side lazy validation works as follows:
Clients are unaware of expiration and optimistically use their local cache for path resolution, maintaining a local version number that reflects all rename operations up to that version.
When a client contacts a metadata server, it sends the pathname and its local version. The server compares the pathname against the invalidation list, checking only the version interval between the request and the latest rename.
If the path is still valid, the server processes the request and returns the result. If the path is invalid, the server aborts the request, returns the new rename information, the client updates its cache, and retries.
3. Analysis and Thoughts
Design Philosophy: Optimistic vs. Pessimistic
InfiniFS adopts Optimistic Concurrency Control:
Clients assume the cache is correct and do not proactively check for expiration, effectively "betting" it is fresh.
Servers validate only upon receiving a request; if validation fails, they fall back to a retry.
This contrasts with traditional pessimistic mechanisms (e.g., NFSv4 leases) that assume caches may expire, enforcing periodic invalidation or renewal. With many clients, lease renewal storms near the root become a bottleneck.
Scalability Key: Broadcast vs. Unicast
InfiniFS broadcasts invalidation information to a small number of servers instead of a large number of clients.
This resembles server‑side callbacks or a publish‑subscribe model, dramatically reducing coordination overhead.
Version Number Mechanism: Lightweight Consistency
Each client maintains a simple version number, avoiding the need to track the state of every cache entry.
Servers only compare rename operations within the version interval, eliminating full‑list comparisons.
This "timestamp + interval check" provides a lightweight consistency protocol.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology Tribe
Focused on computer science and cutting‑edge tech, we distill complex knowledge into clear, actionable insights. We track tech evolution, share industry trends and deep analysis, helping you keep learning, boost your technical edge, and ride the digital wave forward.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
