Summary
Coda builds on AFS to improve availability by leveraging file caches on clients. It behaves like AFS when client is connected to the server to retain the high scalability of AFS.
Coda introduces some new concepts: disconnected operations, which are the operations performed when client is disconnected from the server. When a client is disconnected from the server, disconnected operations on the local cache can still be performed. The operations are logged locally and are sent to the server when the connection is restored. Coda thereby achieves high availability. Coda also introduces volume storage group (VSG), or the set of replication sites of a volume (i.e. the servers with a copy of the volume).
One major difference of Coda from AFS is that, instead of assigning each file a server responsible for all updates to that file, Coda allows updates to be made on more than one server with a replicated file, which will propagate the changes to other replications. During a disconnection, a client may have access to only a subset of a VSG, which we term accessible VSG (AVSG). In Coda, modifications are first sent to AVSG from the client, and eventually propagated to the missing VSG sites. This paper did not mention what happens in presence of concurrent updates to different servers.
When disconnected from the servers, Coda relies on the locally cached copy of files for update. At any moment a Coda client must be in one of the three states: hoarding state, where the client is still connected to the server and performs like AFS; emulation state, where the client is disconnected from the server; reintegraton state, where the connection is restored and client needs to resynchronize with the server (in AVSG).
During the hoarding state, Coda behaves normally as AFS in that it performs all updates on local cache and send the update to the server on close. However, since disconnection can happen anytime, we want to make sure that the files that are most likely be used during a sudden disconnection are cached locally. Coda hence manages the cache using a prioritized algorithm. The key idea is that the users can assign each file different hoard priorities and the prioritized algorithm will evaluate the current priority of a cached object based on its hoard priority and recent usage. Notice that since a file’s current priority changes over time (since it relies on recent usages), it is possible that a file cached locally has lower priority than a file not cached (e.g. File A has a hoard priority of 5 and is not used while File B has a hoard priority of 0 but is used recently: File B will have a higher priority after use but gradually File A will have a higher priority). To compensate this, the client does a hoard walking that reevaluates the priority of each file and replace the caches. Finally, to open a cached file, the client will still have to perform a path resolution, so the parent directories of a cached file cannot be evicted before the cached file gets evicted.
When the client is disconnected, it enters the emulation state, during which it will perform actions normally handled by the servers. This include creating a new file, during which it will generate dummy file identifiers (fids). During the emulation state, all requests are performed against the cached local copy instead of going to the Coda server, so requests on a file not cached locally will fail. The Coda client keeps a replay log locally that contains enough information to replay the updates on the server. Lastly, since we no longer have the remote copy, all modified files are kept in cache with highest priority and are flushed more frequently.
When the connection is restored, the client enters a reintegrations state. The client will first request fids for newly created files from the Coda server, then send the replay log to each server in AVSG in parallel. The Coda server, upon receiving the replay log, will perform the replay algorithm one volume at a time. For each volume, it will first parse the replay log, lock all objects referenced, and validate and execute all operations in the log. Then the actual data are transferred from client to server (a.k.a back-fetching). Finally it will commit the updates in one transaction (per volume) and release all locks. However, replay may fail because another update to the same file occurred during disconnection. Coda asks the client to locally resolve any conflicts in updates.
Strength
- Coda makes the file available during disconnection at a low cost.
- Similar to AFS, Coda is scalable since most operations on the critical path are performed locally.
Weakness
- Similar to AFS, Coda is not good at sharing.
- Coda caches objects at the granularity of a whole file. This is not friendly for huge files or if the client only need a small portion of a file.
- Coda is not very graceful during emulation state if the disk resource at client is exhausted.
Satyanarayanan, Mahadev, et al. “Coda: A highly available file system for a distributed workstation environment.” IEEE Transactions on computers 39.4 (1990): 447-459.