-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encapsulation, access control, and fixity #14
Comments
While standardizing the hierarchy by itself may be interesting for other use cases, in order to achieve the two goals that motivated the creation of WACZ the details of the encapsulation are essential. It needs to be a single file so it can easily be shared easily and that single file needs to be constructed carefully, not just any generic container format, in order to allow incremental loading without downloading/reading the entire collection. |
Edit: I confused atomotic and jcahill as the same person. My bad! |
My impression is that OCFL is especially designed specifically around the need to store multiple versions of data and their digests. I suppose using Bagit may be a better fit, but that wouldn't address the random-access requirement, for which the Zip bundling is still necessary.. |
Maybe there should be a separate WAC directory layout, and the Z part for packing it up as a single Zip file.. But, are users going to open the expanded file, or just use it as sort of a black box, eg. the way a .docx files generally are? I suppose maybe that could be useful if a collection is being actively edited, though its not designed as an edit-in-place format.. |
sorry, i have a precarious connection in train i mistakenly deleted the previous comment. got the point, ocfl design is not useful here. Bagit instead, could be zipped uncompressed. |
One issue that seems to arise from the current draft spec is loss of separation of concerns with respect to accessing and modifying components of the collection. This speaks to both (a) the need for different parties to have differing levels of access to distinct materials and (b) the need to be confident that the underlying capture data has not changed and is being given a wide berth.
Some scenarios that come to mind:
wacz_new
from a subset ofwacz_orig
.Some of these issues could be solvable with some scoping of when exactly the encapsulation is expected to occur in relation to content changes. If the wacz spec is to be seen more as a sort of collection layout convention than an archive file format, compression could itself remain optional, only needing to come into play as a storage-mode consideration, i.e. when collections aren't in a state of heavy development. BagIt's evolution comes to mind. Wikipedia:
The outer zip container is effectively a glorified suitcase for the data and metadata here (wacz draft), so it stands to reason that it might not always be strictly necessary. The hierarchy's hammering down of certain conventions for pairing of web archival data files and their sidecar metadata files strikes me as much more important.
The most important question for me, then, lies in how to effectively reason about contents already in wacz hierarchies, especially for the purposes of aggregating and disaggregating them.
The text was updated successfully, but these errors were encountered: