Identify and Eliminate Unscalable Operations with Lazy GC #53
Labels
development
Standard development
r&d:polykey:core activity 1
Secret Vault Sharing and Secret History Management
Specification
Some operations are "unscalable". For example to delete a file, one has to iterate over the entire file blocks and batch up all the delete operations and then execute them. This is due to the structure of how we are storing each file block as a separate key value entry in a sublevel.
If the file is large, this can take a long time. Furthermore deletion of files should be a lot faster than this. Real filesystems do not need to read the entire file just to delete it.
In order to solve this problem one way is to use lazy garbage collection. When deleting a file, it is marked as no longer accessible. The is already possible since file accessibility is determined by hardlinks which are persisted in the directory inodes.
However what is going to actually delete the key value entries? Real filesystems can simply mark filespace as reusable. Unfortunately this is not possible with a key value structure, and it would just get quite complex.
Therefore, we may instead of a garbage collection process that runs as a stream/loop and quietly deletes un-used entries in the background continuously as the EFS is being used. Right now we have something similar to this in order to remove zombie inodes in our gc sublevel.
Recommend researching the garbage collection techniques like in https://github.com/kenfox/gc-viz and seeing how we can apply it.
Deleting a file isn't the only cause where this can occur. I'm sure there may be other operations that are unscalable. Identify all of these in this issue.
Additional context
Tasks
setTimeout
yields (implemented as sleep promises).The text was updated successfully, but these errors were encountered: