-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion of Postcard Model #42
Comments
(Tagging: @anarchivist @no-reply @escowles @cmh2166 @tpendragon @mjgiarlo) |
Two concerns, I'll put them in separate comments for discussion:
Postcard 1 has an image representation of the whole (a thumbnail) which is FileSet referenced via hasRelated Object.
The front of postcard 1 has a FileSet which is representative of the whole object as a FileSet linked via pcdm:hasMember. These two patterns seem conflicting, and work against the flexibility to have a collection of all the front pages of postcards, for instance. Do we need to pick a pattern (always hasRelatedObject? Is hasRelatedObject's semantics right for "representative fileset"? Is there something that should be on that FileSet to identify it as a representation...? Do we need a new predicate?) |
Second concern: Why is the front of the postcard a |
Re related vs member, I agree. We're confusing representationOf and partOf ... leading to putting representations in And re Object vs Work ... I was taking the approach that a Work is a bound or otherwise coherent thing. If the back of a postcard is a Work, then is the verso of page 214 of a book also a Work? |
I'm leaning this way too.
When CC uses the "Part" stuff, it will be in Plum at least. And I think that's right. |
If it's a Work, what's the difference between a Work and an Object? Can't we just use Object all the time? |
Only if you let Works have Files. |
Have Files with hasFile, or have FileSets with (hasMember/hasRelatedObject/hasRepresentation)? |
The reason there's a Work is because there are restrictions that don't exist on Objects (I think.) Of course, restrictions like that aren't present in RDF without something like SHACL, so we could always just say "if your Object hasFile something, and we're in Works land, we're not gonna do anything with it" |
Okay, so those restrictions should define the difference :) What are the restrictions? |
Re related vs member: Thinking that we are talking about hasRepresentation here as what relates a PCDM object to the digital representation, not as a possible inverse to edm:isRepresentationOf (which relates an object to a RWO or another resource to the repository object). Are we not intimating in some way that [Object hasMember Fileset hasFile File] is that hasRepresentation relationship, made more complicated to support multiple versions of that file/representation? So...
Differentiating hasRepresentation from hasMember, +1 (so not to use hasRelatedObject for odd outliers), but in the meantime, the thumbnail doesn't seem, to me, to fall under that 'is not a component part' portion of the hasRelatedObject comment. But I'm open to either. Works/Objects comment forthcoming. Hope that helps, and at least, doesn't hinder. |
Looking at the comments here: https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works/models/concerns/work_behavior.rb Is it saying that pcdmw:Work could not pcdm:hasMember pcdm:Object that isn't a pcdmw:Fileset? |
Agree with @tpendragon that Two other things worth calling out:
why not combine the FileSets:
If we want a scenario that demonstrates two FileSets under a Work, maybe this would be better:
|
So there is no situation in which you would use pcdm:Object, only ever FileSet (for a bundle of files) and Work (for everything else)? If there's no other distinction between Object and Work, I don't see the point of having the subclass. Just have an application profile that says Don't use hasFile from Object if you're using FileSets.
No one would ever use "Work" to describe a page of a book given this definition. A Chapter (e.g. a Range, subClassOf Object, not Work) is more of a Work than a physical page as it's the logical or textual structure that came from the intellectual/creative effort, not the printing of the physical Item. I understood Work as "An Object that represents a coherent and complete intellectual entity which should be separately discoverable from other Works"
There isn't a predicate yet for the preferred representation though.
Because .txt is derived from .tei, and FileSets have a single master and a single media type. The TEI is not derived from the Image. |
Granted that the current description of Work is inadequate. But if you digitize an atlas, creating a Work for the atlas as a whole, and an Object for each page/map, what happens when you want to add one of the maps to a Collection of maps? Do you upgrade that Object to a Work at that point? Or do you create all the pages as Works? |
That seems an oddly specific instance of Pages, one where they are Works as defined by what Rob points out (they're complete maps as well as pages). This is evidenced by 'page/map' part of your response, right? We're saying there the combination of carrier part and some kind of complete work. In that same atlas example, you could have an atlas that has a single map that spans pages. I'd propose that the atlas is a work, the map is a work, and the pages are objects. I didn't think about this before, but I agree with Rob's critique of everything that isn't a Collection or a Fileset becoming a Work seemingly by default, and we probably don't want that to happen (or if we do, it changes how we are defining Work, and the documentation should be updated). What was the original intention of Work in this context? (I'm sorry to say I honestly don't know as I'm a recent interloper) To just add some functional aspects to generic Objects vis-a-vis the HydraWorks/PCDMWorks gem and LDP specification? Or to create a PCDM extension ontology that allows for the concept of Works, and attach functionalities to those concepts? While I would have said before that the postcard sides are Works (my original question that twitter thread), I'd agree with Rob's response to that in this thread - in his given example, they should just be Objects. They are "Parts" that have no complete Work aspect to them (as far as we know). But I'd leave the option open that someone dealing with particular kinds of postcards may run into an instance where a side could be a Work (if a side is a Map, say). Hope I haven't misunderstood the question nor sidetracked the discussion. |
If the map has its own intellectual value separate from the atlas, then I would create it as a Work from the outset. However, I don't see why that would prevent it from being in a Collection if it was just an Object? I can have a Collection of pages that mention Paris, regardless of whether the page is a Work or an Object. If the depositor doesn't say that they're Works so they get created as Objects, and someone later wants to change that ... then I don't see a problem with that either? In the separate resource for metadata model, I would associate edm:isRepresentationOf with Work rather than Object. If you want to have metadata beyond a basic label for a PCDM object, then it's a Work. If you're okay with it just being a constituent member, then it's an Object. In the postcard case if it was important to describe the artwork on the front of the postcard, or the writing on the back, then make it a Work with its own metadata. If it's just a "page" without meaningful differentiation for, then leave it as an object. |
I really don't see why there needs to be a distinction. I'm pretty sure I prefer all works or all objects, because it makes the user experience easier to figure out. If the question is, "should there be Works at all, rather than Objects with restrictions", I think that's valid, and I'm not sure what best practice is there. But using the two because they exist to draw lines where there doesn't need to be any seems too complex. |
The atlas/map example is obviously contrived, but I think the basic point is that whether something is a work or a part is largely contextual. This is something we've talked about from the first meetings in Portland, all the way up to LDCX. I doubt whether we can know what the future context of our objects will be, and (selecting maps to go in the Collection could happen after the atlas was created in the repository), so I would rather create it as the type of resource that can stand alone, even if we don't expect that it will. I would be fine with getting rid of the Work class and using Object for everything that's not a FileSet or a Collection. And having an application profile say that Files should be attached to FileSets only seems reasonable. |
I think the "Postcard Thumbnail Image" FileSet should be _:tn2fs1 not _:fn2fs1, see:
|
I'm happy to just have Object and Fileset. Will update the document this afternoon. |
Changes:
Thoughts? |
Can more than one Object hasMember / hasFileSet any given FileSet? As it seems like a wrapper around hasFile for grouping, I would expect not? Along with filesets not being ordered, it seems like a direct container would do the trick in LDP without the need for proxies. |
I have been following the conversation and I want to be clear about where we landed. Are you saying... Cardinality OptionsObject has... Option 1: an Object can have one-and-only-one FileSet FileSets can be in... Option 1: a FileSet can belong to one-and-only-one Object (I think you are saying this.) OrderingSets of Objects (gathered by an Object or a Collection) can have one (or more) orders applied to the set. The first order has first and last directly in the gathering Object or Collection. Additional orders are applied by having an in between OrderObject for each additional order. Sets of FileSets (gathered by an Object) cannot have order. I am hesitant on two points (assuming I interpreted things correctly). Note since both Objects and Collections can have Objects as members, I will refer to them generically as aggregations.
Lynette edited by Rob to make the github UI not collapse all the content down to nothing |
My opinion:
|
A few more questions:
Lynette From: Rob Sanderson <[email protected]mailto:[email protected]> My opinion:
You are receiving this because you commented. |
Each fileset has a single master file, with derivatives. You likely want to associate multiple master files, each with their own derivatives with a single Object. For example:
You can't do that at the moment. Is there a real use case for this?
|
If FileSets are for grouping an original file together with its derivatives, then I think they also should have metadata about creation (creator, date, software, etc.). In the postcard image/transcription example, you might want to know who did the transcribing, or when. Maybe that's not descriptive metadata, but it's broader than I've been thinking of technical metadata. |
Good point! We should be clearer about the distinctions. I think there's at least file characterization (associated with the File), the provenance of the file and derivatives (associated with the File and/or FileSet?), and then the broader descriptive information about the Object. I'll update the example with some provenance on the FileSet. |
I've understood FileSets as a grouping of Files that allows assertion of descriptive metadata about said grouping (and/or, arguably, the Part of the problem of course is that the categories of metadata used by cultural heritage organizations are... not cut and dried, and feel distinctly pre-RDF to me. Put otherwise, the boundary between descMD and techMD (etc.) is permeable and fuzzy at best. If I run FITS and it returns an author (as extracted from embedded file metadata), is it techMD or descMD? Better question... should we care? |
If we don't care, and people put dcterms:creator on the File sometimes and on the FileSet other times, there seems to be a usage cost -- In order to find out if there's a creator, you need to check multiple locations. I think it's easier to associate that sort of information with the FileSet than with the File. |
I'm 🆒 with that. |
RE: Use case for sharing FileSet with other aggregations. First an observation: In your examples, all the files in the FileSet are variants of the same 'object' (e.g. different resolution, scanned at different times, scanned using different techniques, etc.) In this case, it makes sense that if you want to share the 'object' with other aggregations that you would share all the FileSets which equates to sharing the Object holding all the FileSet variants of the 'object'. Self-deposit IR use case: There is no enforced restriction preventing a user from gathering FileSets for different 'objects' in a single Object. For example, a user might put a presentation, poster, and paper presented at a conference in the same Object. The user might also have a collection for My Presentations. The only FileSet from the original Object that the user wants to include in the My Presentations collection is the presentation FileSet. |
@elrayle, I think the new model would be implemented by creating an pcdm:Object for each file a user uploads. The Object would have a single FileSet, which would contain the file, plus extracted text, thumbnail, etc. |
@escowles If I understand correctly, your only adjustment to the displayed model is pcdm:hasFileSet is 1:1 instead of m:m? I believe @azaroth42 earlier stated that there can be 1:m (one Object with many FileSet) so that the Object can hold uploaded variants (e.g. scans from different times/techniques, etc.) of the same 'object' each in its own FileSet and the FileSet holds generated derivatives of the 'object'. |
@elrayle Yes, that's my understanding. The use cases for multiple FileSets attached to an Object are things like multiple digitizations, adding TEI transcriptions, etc. I wouldn't expect to have those very often in an IR, but maybe there are examples I'm not thinking of. |
Is this still the application profile for PCDM... https://github.com/duraspace/pcdm/wiki? Are the changes described in this thread effecting PCDM proper or are they extensions? |
@elrayle I suspect this will result in a new version of PCDM proper, refining the PCDM Works extension and pulling it into PCDM. |
I think there are use cases for aggregating FileSets in multiple aggregations, and definitely use cases for not wanting pull all the FileSets under an Object in. You're going to have digitization folks uploading a batch of stuff (Preservation Images of Postcard Archives Box 24; Infrared Scans of Postcard Archives) for whom the batch is the actual thing. You're going to need to be able to aggregate them under objects later (Bahamas Postcard) for context later. The relationship of those FileSets to the aggregating object is going to be contextual. So while I totally agree with needing multiple FileSets per object, and I totally agree with needing to have an Object that represents Parts, I think we're going down a questionable path as far as cardinality of aggregation and prescribed containment. |
Are you using postcards as an example use case for working through modeling issues? Or is hybox planning to support models for several well defined common use cases? |
I was borrowing the example context of postcards from Rob (not the literal example he used, just the pretend-concrete type of things), but:
|
@elrayle Our top two priorities for content types now are multi-file works (modeling a flexible need for "traditional" repository deposits, if you will) and photographs. See this blog post for more on that. Postcards are a slightly more complex use case than photographs, so we were satisfying our modeling needs for photographs and simultaneously checking to see if that model could be built upon for more complex models. |
Re Postcards ... they're a good stand in for a "real" CH image based object. They have text, artwork, multiple sides and thus order, and so forth. You can easily extrapolate forwards to many ordered sides from two, or backwards from two to just one. And as the UCSB postcard example shows, it can be much more complex than that ... on the same order of magnitude as an atlas of maps. Regarding batch as the "thing" ... I disagree that it's something the repository should care about at a core object level. If the batch is important then create an identity for it, and link the fileset to it... but the use of the fileset is associated with the object, not the workflow in which the image was created. That said, the existence of the FileSet being dependent on the existence of the object that the FS is related to is a valid point. We shouldn't presume that the structure will exist before the content does in the repository. |
Right: I don't want to reify batchness as a separate type of object; but I want to recognize that a single CH repository has core audiences that reckon object contexts for FileSets differently, and that this is expressed in the workflow- hence a need for reaggregation. This in some ways looks back longingly at AdministrativeSet, I know... |
My concern is in building certain assumptions into the model. I don't think you can make an assumption for the general case that all FileSets that are in the same PCDM:Object are different representations of the same 'object'. How will Sufia enforce such an assumption for self-deposit users who may put a presentation, poster, and paper all in the same Hydra:Work? |
And making the restriction that a FileSet can live in one-and-only-one PCDM:Object means that the user who added the presentation, poster, and paper in the same Hydra:Work cannot then put just the presentation in another Work they are using to hold all their presentations. |
@elrayle , I think what's being proposed here is that Sufia would create a structure like this for a single uploaded file:
If you added another representation of the same presentation (e.g., an audio recording of it), you might add that to the same Part as a second FileSet:
I agree that supporting a files-first workflow means that FileSets should probably not be directly contained, and that opens up the possibility of them being members of multiple Objects. If the user who uploaded the presentation and audio recording above wanted to add them to another work, I would expect the Sufia UI to let them select either the Part or the Work, but not the FileSets. |
@escowles Is Sufia going to automatically insert the Presentation Part (pcdm:Object) or is the user going to need to build this structure? Here is my understanding of what users can currently do in Sufia...
After completion, there will be one Work with X members that are all FileSets (where X=number of files uploaded.) Steps: (if automatic)
Steps: (if manual)
Questions:
|
@elrayle I would expect the Part to be required and created automatically (but @mjgiarlo can chime in on that). I expect sharing FileSets or Parts between Works to be deferred until after Sufia 7.0, and probably require some use case building, UI work, etc. before implementation. The "work that holds all my presentations" (maybe a CV, bibliography, or something like that?) sounds like a good scenario to explore, in addition to the files-first batch workflow scenario. |
@escowles If automatic, what is the process for a user to add a second file to the part? E.g. user uploaded presentation ppt. Now how do they make the presentation video another FileSet in the same presentation ppt's Part? |
@escowles just to remind that Sufia is Sufia, and HyBox is 50% Sufia, but the changes being discussed here are in Hydra::Works and PCDM- that's what gets me jumpy. Sufia's use cases are more clearly delineated and predictable (I think). My expectation is that more generic collections apps will be operating on a superset of Sufia content and using Hydra::Works and PCDM to navigate all of it. |
@barmintor, it's good to bring that up and be careful about these things. I'm mostly thinking about the Hydra::Works and CurationConcerns part of this. I have an incomplete notion of how Sufia will take advantage of these features, and an even fuzzier idea of HyBox will build on top of that. @elrayle, I think the answer is that we would need to figure out how to add a FileSet to a Part, how the UI should work for that, what it's called, etc. I think the distinction between adding a Part to a Work, vs. adding a FileSet to a Part, vs. adding a File to an existing FileSet is going to be challenging to make clear. So the first task is to be clear on what they mean, and then decide which of those Sufia needs to support. It could be that only adding a Part to an existing Work is supported, so Sufia would need to be updated to create the intermediate Part object, but would otherwise not need to be changed. |
I would like to see all the proposed changes in an Application Profile. There are at least a few organizations who are building apps on PCDM now and these changes will have an impact. It will be necessary to have clear documentation of the proposed changes and behavioral assumptions, some of which are not enforceable by the code. |
@barmintor raises a good point. This thread has (understandably) morphed a bit from "let's define a model Hydra-in-a-Box can use" to "let's discuss making changes to PCDM and hydra-works." I suggest we might want to have that latter discussion elsewhere, like the PCDM google group. |
At: https://github.com/hybox/models/blob/master/notes/usecase.md
The text was updated successfully, but these errors were encountered: