Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion of Postcard Model #42

Open
azaroth42 opened this issue Apr 15, 2016 · 57 comments
Open

Discussion of Postcard Model #42

azaroth42 opened this issue Apr 15, 2016 · 57 comments

Comments

@azaroth42
Copy link
Contributor

azaroth42 commented Apr 15, 2016

At: https://github.com/hybox/models/blob/master/notes/usecase.md

@azaroth42
Copy link
Contributor Author

(Tagging: @anarchivist @no-reply @escowles @cmh2166 @tpendragon @mjgiarlo)

@tpendragon
Copy link

Two concerns, I'll put them in separate comments for discussion:

_:pc1 a pcdmw:Work ;
  rdfs:label "Postcard" ;
  edm:isRepresentationOf _:rwopc1 ;
  pcdm:hasMember _:front1, _:back1 ;
  pcdm:hasRelatedObject _:tn2 .

Postcard 1 has an image representation of the whole (a thumbnail) which is FileSet referenced via hasRelated Object.

_:front1 a pcdm:Object ;
  rdfs:label "Front of Postcard" ;
  pcdm:hasMember _:frontfs1 .

The front of postcard 1 has a FileSet which is representative of the whole object as a FileSet linked via pcdm:hasMember.

These two patterns seem conflicting, and work against the flexibility to have a collection of all the front pages of postcards, for instance. Do we need to pick a pattern (always hasRelatedObject? Is hasRelatedObject's semantics right for "representative fileset"? Is there something that should be on that FileSet to identify it as a representation...? Do we need a new predicate?)

@tpendragon
Copy link

Second concern:

Why is the front of the postcard a pcdm:Object and not a pcdmw:Work? In order to support the use case of "a collection of all the fronts of postcards", for instance, each layer of the hierarchy would have to look the same. I think they both act the same (or, at least, SHOULD), and the goal here seems to be to identify it as a Part - but it's only a Part in the context of its parent. Outside that context it's a standalone object - I think?

@azaroth42
Copy link
Contributor Author

@tpendragon

Re related vs member, I agree. We're confusing representationOf and partOf ... leading to putting representations in hasRelatedObject when there are parts, and in hasMember when there aren't. If we had hasRepresentation separate from hasMember, we would solve the issue. And likely also Stefano and Adam W's concerns at the same time? Before pcdm-works this wasn't an issue, as representations would have gone directly in hasFile -- now there's an intervening Object subclass.

And re Object vs Work ... I was taking the approach that a Work is a bound or otherwise coherent thing. If the back of a postcard is a Work, then is the verso of page 214 of a book also a Work?
I'm happy to drop Work completely and just use Object, or to come to a clearer definition of when to use each.

@tpendragon
Copy link

If we had hasRepresentation separate from hasMember, we would solve the issue.

I'm leaning this way too.

is the verso of page 214 of a book also a Work?

When CC uses the "Part" stuff, it will be in Plum at least. And I think that's right.

@azaroth42
Copy link
Contributor Author

If it's a Work, what's the difference between a Work and an Object? Can't we just use Object all the time?

@tpendragon
Copy link

If it's a Work, what's the difference between a Work and an Object? Can't we just use Object all the time?

Only if you let Works have Files.

@azaroth42
Copy link
Contributor Author

Have Files with hasFile, or have FileSets with (hasMember/hasRelatedObject/hasRepresentation)?
But either way, Objects can have Files and FileSets, so we can still drop Work?

@tpendragon
Copy link

The reason there's a Work is because there are restrictions that don't exist on Objects (I think.) Of course, restrictions like that aren't present in RDF without something like SHACL, so we could always just say "if your Object hasFile something, and we're in Works land, we're not gonna do anything with it"

@azaroth42
Copy link
Contributor Author

The reason there's a Work is because there are restrictions that don't exist on Objects

Okay, so those restrictions should define the difference :) What are the restrictions?

@tpendragon
Copy link

What are the restrictions?

I'll need backup from @escowles, @jcoyne, @mjgiarlo, @cmh2166 and company, but the one I know of is "Works can't hasFile"

@cmharlow
Copy link

cmharlow commented Apr 15, 2016

Re related vs member:

Thinking that we are talking about hasRepresentation here as what relates a PCDM object to the digital representation, not as a possible inverse to edm:isRepresentationOf (which relates an object to a RWO or another resource to the repository object).

Are we not intimating in some way that [Object hasMember Fileset hasFile File] is that hasRepresentation relationship, made more complicated to support multiple versions of that file/representation? So...

_:pc1 a pcdmw:Work ;
  rdfs:label "Postcard" ;
  edm:isRepresentationOf _:rwopc1 ;
  pcdm:hasMember _:front1, _:back1, _:tn2 .

_:tn2 a pcdmw:Fileset ;
  rdfs:label "Postcard Thumbnail" ;
  a ??:Thumbnail (would we want to add some kinda type vocab here?) ;
  pcdm:hasFile <thumbnail.jpg> ;
  pcdm:hasFile <thumbnail.tiff> .

Differentiating hasRepresentation from hasMember, +1 (so not to use hasRelatedObject for odd outliers), but in the meantime, the thumbnail doesn't seem, to me, to fall under that 'is not a component part' portion of the hasRelatedObject comment. But I'm open to either.

Works/Objects comment forthcoming.

Hope that helps, and at least, doesn't hinder.

@cmharlow
Copy link

Looking at the comments here: https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works/models/concerns/work_behavior.rb

Is it saying that pcdmw:Work could not pcdm:hasMember pcdm:Object that isn't a pcdmw:Fileset?

@escowles
Copy link

Agree with @tpendragon that _:front1 and _:back1 should be Works instead of Objects (probably all the Objects in the example should be Works). When we said at LDCX that we were using Part for convenience, I understood that to mean it was short for "Work that is part of another Work in some context". I also agree that the limitation of not having Files is an important limitation of Works that doesn't apply to Objects.

Two other things worth calling out:

  1. _:pc1 pcdm:hasRelatedObject _:tn2 seems fine, but only if _:tn2 is a purpose-made thumbnail image (maybe a composite showing the front and back of the postcard in a single image?). But I would typically expect it to use one of the child objects as the preferred representation (see the mailing list discussion).
  2. Instead of:
_:back1   pcdm:hasMember _:backfs1, _:backfs2 .

_:backfs1 a pcdmw:FileSet ;
  rdfs:label "Back of Postcard Image" ;
  pcdm:hasFile </backfs1/files/back.jp2>, </backfs1/files/back.jpg> .

_:backfs2 a pcdms:FileSet ;
  rdfs:label "Back of Postcard Transcription" ;
  pcdm:hasFile </backfs2/files/tei.xml>, </backfs2/files/transcription.txt> .

why not combine the FileSets:

_:back1   pcdm:hasMember _:backfs1 .

_:backfs1 a pcdmw:FileSet ;
  rdfs:label "Back of Postcard" ;
  pcdm:hasFile </backfs1/files/back.jp2>, </backfs1/files/back.jpg>,
    </backfs2/files/tei.xml>, </backfs2/files/transcription.txt> .

If we want a scenario that demonstrates two FileSets under a Work, maybe this would be better:

_:back1   pcdm:hasMember _:backfs1, _:backfs2 .

_:backfs1 a pcdmw:FileSet ;
  rdfs:label "Back of Postcard Image" ;
  dc:date "1999" ;
  pcdm:hasFile </backfs1/files/back.jpg> .

_:backfs2 a pcdms:FileSet ;
  rdfs:label "Back of Postcard Image" ;
  dc:date "2014" ;
  pcdm:hasFile </backfs2/files/back.jp2>, </backfs2/files/back.jpg>,
    </backfs2/files/tei.xml>, </backfs2/files/transcription.txt> .

@azaroth42
Copy link
Contributor Author

probably all the Objects in the example should be Works

So there is no situation in which you would use pcdm:Object, only ever FileSet (for a bundle of files) and Work (for everything else)? If there's no other distinction between Object and Work, I don't see the point of having the subclass. Just have an application profile that says Don't use hasFile from Object if you're using FileSets.

Work: A work or intellectual entity, such as a book, film, dissertation, etc.

No one would ever use "Work" to describe a page of a book given this definition. A Chapter (e.g. a Range, subClassOf Object, not Work) is more of a Work than a physical page as it's the logical or textual structure that came from the intellectual/creative effort, not the printing of the physical Item.

I understood Work as "An Object that represents a coherent and complete intellectual entity which should be separately discoverable from other Works"

preferred representation

There isn't a predicate yet for the preferred representation though.

why not combine the FileSets:

Because .txt is derived from .tei, and FileSets have a single master and a single media type. The TEI is not derived from the Image.

@escowles
Copy link

Granted that the current description of Work is inadequate. But if you digitize an atlas, creating a Work for the atlas as a whole, and an Object for each page/map, what happens when you want to add one of the maps to a Collection of maps? Do you upgrade that Object to a Work at that point? Or do you create all the pages as Works?

@cmharlow
Copy link

That seems an oddly specific instance of Pages, one where they are Works as defined by what Rob points out (they're complete maps as well as pages). This is evidenced by 'page/map' part of your response, right? We're saying there the combination of carrier part and some kind of complete work.

In that same atlas example, you could have an atlas that has a single map that spans pages. I'd propose that the atlas is a work, the map is a work, and the pages are objects.

I didn't think about this before, but I agree with Rob's critique of everything that isn't a Collection or a Fileset becoming a Work seemingly by default, and we probably don't want that to happen (or if we do, it changes how we are defining Work, and the documentation should be updated). What was the original intention of Work in this context? (I'm sorry to say I honestly don't know as I'm a recent interloper) To just add some functional aspects to generic Objects vis-a-vis the HydraWorks/PCDMWorks gem and LDP specification? Or to create a PCDM extension ontology that allows for the concept of Works, and attach functionalities to those concepts?

While I would have said before that the postcard sides are Works (my original question that twitter thread), I'd agree with Rob's response to that in this thread - in his given example, they should just be Objects. They are "Parts" that have no complete Work aspect to them (as far as we know). But I'd leave the option open that someone dealing with particular kinds of postcards may run into an instance where a side could be a Work (if a side is a Map, say).

Hope I haven't misunderstood the question nor sidetracked the discussion.

@azaroth42
Copy link
Contributor Author

If the map has its own intellectual value separate from the atlas, then I would create it as a Work from the outset. However, I don't see why that would prevent it from being in a Collection if it was just an Object? I can have a Collection of pages that mention Paris, regardless of whether the page is a Work or an Object. If the depositor doesn't say that they're Works so they get created as Objects, and someone later wants to change that ... then I don't see a problem with that either?

In the separate resource for metadata model, I would associate edm:isRepresentationOf with Work rather than Object. If you want to have metadata beyond a basic label for a PCDM object, then it's a Work. If you're okay with it just being a constituent member, then it's an Object. In the postcard case if it was important to describe the artwork on the front of the postcard, or the writing on the back, then make it a Work with its own metadata. If it's just a "page" without meaningful differentiation for, then leave it as an object.

@tpendragon
Copy link

I really don't see why there needs to be a distinction. I'm pretty sure I prefer all works or all objects, because it makes the user experience easier to figure out.

If the question is, "should there be Works at all, rather than Objects with restrictions", I think that's valid, and I'm not sure what best practice is there. But using the two because they exist to draw lines where there doesn't need to be any seems too complex.

@escowles
Copy link

The atlas/map example is obviously contrived, but I think the basic point is that whether something is a work or a part is largely contextual. This is something we've talked about from the first meetings in Portland, all the way up to LDCX. I doubt whether we can know what the future context of our objects will be, and (selecting maps to go in the Collection could happen after the atlas was created in the repository), so I would rather create it as the type of resource that can stand alone, even if we don't expect that it will.

I would be fine with getting rid of the Work class and using Object for everything that's not a FileSet or a Collection. And having an application profile say that Files should be attached to FileSets only seems reasonable.

@eocarragain
Copy link

I think the "Postcard Thumbnail Image" FileSet should be _:tn2fs1 not _:fn2fs1, see:

_:tn2 a pcdm:Object ;
  rdfs:label "Postcard Thumbnail" ;
  pcdm:hasMember _:tn2fs1 .

_:fn2fs1 a pcdmw:FileSet ;
  rdfs:label "Postcard Thumbnail Image" ;
  pcdm:hasFile </fn2fs1/files/thumbnail.jpg> .

@azaroth42
Copy link
Contributor Author

I'm happy to just have Object and Fileset. Will update the document this afternoon.

@azaroth42
Copy link
Contributor Author

Changes:

  • No more Work class, just Object
  • Added a hasFileSet predicate (sub hasMember) to distinguish filesets from members/related objects
  • Added a hasMaster predicate (sub hasFile) to distinguish the master file from the derived files
  • Removed extraneous objects that only sat between other objects/collections and the thumbnail filesets

Thoughts?

@azaroth42
Copy link
Contributor Author

Can more than one Object hasMember / hasFileSet any given FileSet? As it seems like a wrapper around hasFile for grouping, I would expect not?

Along with filesets not being ordered, it seems like a direct container would do the trick in LDP without the need for proxies.

@elrayle
Copy link

elrayle commented Apr 19, 2016

I have been following the conversation and I want to be clear about where we landed. Are you saying...

Cardinality Options

Object has...

Option 1: an Object can have one-and-only-one FileSet
Option 2: an Object can have many FileSets (I think you are saying this.)

FileSets can be in...

Option 1: a FileSet can belong to one-and-only-one Object (I think you are saying this.)
Option 2: a FileSet can belong to any number of Objects

Ordering

Sets of Objects (gathered by an Object or a Collection) can have one (or more) orders applied to the set. The first order has first and last directly in the gathering Object or Collection. Additional orders are applied by having an in between OrderObject for each additional order.

Sets of FileSets (gathered by an Object) cannot have order.

I am hesitant on two points (assuming I interpreted things correctly). Note since both Objects and Collections can have Objects as members, I will refer to them generically as aggregations.

  • If a FileSet can belong to one and only one Object, then for multiple aggregations to share a File represented in a FileSet, they have to share the Object that has the FileSet as a member. This would lead me to the assumption that an Object can have one-and-only-one FileSet since the Object is the container that allows the FileSet to be part of other aggregations.
  • If an Object can have many FileSets, why can't the FileSets be ordered?

Lynette

edited by Rob to make the github UI not collapse all the content down to nothing

@azaroth42
Copy link
Contributor Author

My opinion:

  • An Object can have many FileSets. A FileSet can be a member of only one Object. Thus a Direct Container is appropriate for the LDP projection, if there's a desire to separate Objects from FileSets.
  • FileSets cannot be ordered, only Objects. Files within FileSets cannot be ordered.
  • FileSets cannot have descriptive metadata of their own, beyond label, as they're just a convenience for grouping binaries together.
  • I don't know of a use case for when the same FileSet would be part of multiple Objects/Collections. If there is one, let me know! However, I think this also follows from FileSets being just a grouping mechanism for files, which can only be part of one Object/Collection. Before FileSets, you still had to aggregate an Object not a File ... and the introduction of FileSet doesn't change that.
  • I also don't know of a use case for ordering groupings of files that shouldn't be actually applied to an object. You couldn't order Files before, so you can't order FileSets.

@elrayle
Copy link

elrayle commented Apr 19, 2016

A few more questions:

  • Given that the only way for a FileSet to be shared with multiple aggregations is for it to be a member of an Object and then make that Object a member of another aggregation, what is the use case for multiple FileSets in an Object?
  • What is the path for a FileSet that is added as part of several FileSets in a single Object to be shared by itself with another aggregation?
  • Which of these continue to hold?
  • Collection can haveMember Collection
  • Collection can haveMember Object
  • Collection can NOT haveMember FileSet
  • Object can haveMember Object
  • Object can haveMember FileSet
  • Object can haveFile File
  • Are there any restrictions that would limit Object from haveMember Objects and FileSets in the same Object?

Lynette

From: Rob Sanderson <[email protected]mailto:[email protected]>
Reply-To: hybox/models <[email protected]mailto:[email protected]>
Date: Tuesday, April 19, 2016 at 12:20 PM
To: hybox/models <[email protected]mailto:[email protected]>
Cc: Lynette Rayle <[email protected]mailto:[email protected]>
Subject: Re: [hybox/models] Discussion of Postcard Model (#42)

My opinion:

  • An Object can have many FileSets. A FileSet can be a member of only one Object. Thus a Direct Container is appropriate for the LDP projection, if there's a desire to separate Objects from FileSets.
  • FileSets cannot be ordered, only Objects. Files within FileSets cannot be ordered.
  • FileSets cannot have descriptive metadata of their own, beyond label, as they're just a convenience for grouping binaries together.
  • I don't know of a use case for when the same FileSet would be part of multiple Objects/Collections. If there is one, let me know! However, I think this also follows from FileSets being just a grouping mechanism for files, which can only be part of one Object/Collection. Before FileSets, you still had to aggregate an Object not a File ... and the introduction of FileSet doesn't change that.
  • I also don't know of a use case for ordering groupings of files that shouldn't be actually applied to an object. You couldn't order Files before, so you can't order FileSets.

You are receiving this because you commented.
Reply to this email directly or view it on GitHubhttps://github.com//issues/42#issuecomment-212002911

@azaroth42
Copy link
Contributor Author

  • Given that the only way for a FileSet to be shared with multiple aggregations is for it to be a member of an Object and then make that Object a member of another aggregation, what is the use case for multiple FileSets in an Object?

Each fileset has a single master file, with derivatives. You likely want to associate multiple master files, each with their own derivatives with a single Object. For example:

  • Multiple digitizations of the same thing, at different times
  • Multiple representations, e.g. scanned image (with derivs), TEI (with derivs), Audio (with derivs)
  • What is the path for a FileSet that is added as part of several FileSets in a single Object to be shared by itself with another aggregation?

You can't do that at the moment. Is there a real use case for this?

Which of these continue to hold?

  • Collection can haveMember Collection // Yes
  • Collection can haveMember Object // Yes
  • Collection can NOT haveMember FileSet // I would prefer a different predicate and Container for FileSets from hasMember, but Collections must be able to have FileSets somehow.
  • Object can haveMember Object
  • Object can haveMember FileSet // Same as Collection
  • Object can haveFile File // I think the application profile should say that Objects SHOULD NOT have Files directly, even if there's only a single File, it should be in a FileSet.
  • Are there any restrictions that would limit Object from haveMember Objects and FileSets in the same Object? // As above, I would prefer different predicates to link to FileSets.

@escowles
Copy link

FileSets cannot have descriptive metadata of their own, beyond label, as they're just a convenience for grouping binaries together.

If FileSets are for grouping an original file together with its derivatives, then I think they also should have metadata about creation (creator, date, software, etc.). In the postcard image/transcription example, you might want to know who did the transcribing, or when. Maybe that's not descriptive metadata, but it's broader than I've been thinking of technical metadata.

@azaroth42
Copy link
Contributor Author

Good point! We should be clearer about the distinctions. I think there's at least file characterization (associated with the File), the provenance of the file and derivatives (associated with the File and/or FileSet?), and then the broader descriptive information about the Object.

I'll update the example with some provenance on the FileSet.

@mjgiarlo
Copy link
Member

mjgiarlo commented Apr 21, 2016

I've understood FileSets as a grouping of Files that allows assertion of descriptive metadata about said grouping (and/or, arguably, the original_file), including a label but possibly including much more.

Part of the problem of course is that the categories of metadata used by cultural heritage organizations are... not cut and dried, and feel distinctly pre-RDF to me. Put otherwise, the boundary between descMD and techMD (etc.) is permeable and fuzzy at best. If I run FITS and it returns an author (as extracted from embedded file metadata), is it techMD or descMD? Better question... should we care?

@azaroth42
Copy link
Contributor Author

If we don't care, and people put dcterms:creator on the File sometimes and on the FileSet other times, there seems to be a usage cost -- In order to find out if there's a creator, you need to check multiple locations. I think it's easier to associate that sort of information with the FileSet than with the File.

@mjgiarlo
Copy link
Member

I'm 🆒 with that.

@elrayle
Copy link

elrayle commented Apr 22, 2016

RE: Use case for sharing FileSet with other aggregations.

First an observation: In your examples, all the files in the FileSet are variants of the same 'object' (e.g. different resolution, scanned at different times, scanned using different techniques, etc.) In this case, it makes sense that if you want to share the 'object' with other aggregations that you would share all the FileSets which equates to sharing the Object holding all the FileSet variants of the 'object'.

Self-deposit IR use case: There is no enforced restriction preventing a user from gathering FileSets for different 'objects' in a single Object. For example, a user might put a presentation, poster, and paper presented at a conference in the same Object. The user might also have a collection for My Presentations. The only FileSet from the original Object that the user wants to include in the My Presentations collection is the presentation FileSet.

@elrayle
Copy link

elrayle commented Apr 22, 2016

Does this express the model being proposed? FileSets become a first class model in PCDM?

pcdm_model-with-filesets

@escowles
Copy link

@elrayle, I think the new model would be implemented by creating an pcdm:Object for each file a user uploads. The Object would have a single FileSet, which would contain the file, plus extracted text, thumbnail, etc.

@elrayle
Copy link

elrayle commented Apr 22, 2016

@escowles If I understand correctly, your only adjustment to the displayed model is pcdm:hasFileSet is 1:1 instead of m:m? I believe @azaroth42 earlier stated that there can be 1:m (one Object with many FileSet) so that the Object can hold uploaded variants (e.g. scans from different times/techniques, etc.) of the same 'object' each in its own FileSet and the FileSet holds generated derivatives of the 'object'.

@escowles
Copy link

@elrayle Yes, that's my understanding. The use cases for multiple FileSets attached to an Object are things like multiple digitizations, adding TEI transcriptions, etc. I wouldn't expect to have those very often in an IR, but maybe there are examples I'm not thinking of.

@elrayle
Copy link

elrayle commented Apr 25, 2016

Is this still the application profile for PCDM... https://github.com/duraspace/pcdm/wiki?

Are the changes described in this thread effecting PCDM proper or are they extensions?

@mjgiarlo
Copy link
Member

@elrayle I suspect this will result in a new version of PCDM proper, refining the PCDM Works extension and pulling it into PCDM.

@barmintor
Copy link

I think there are use cases for aggregating FileSets in multiple aggregations, and definitely use cases for not wanting pull all the FileSets under an Object in. You're going to have digitization folks uploading a batch of stuff (Preservation Images of Postcard Archives Box 24; Infrared Scans of Postcard Archives) for whom the batch is the actual thing. You're going to need to be able to aggregate them under objects later (Bahamas Postcard) for context later. The relationship of those FileSets to the aggregating object is going to be contextual. So while I totally agree with needing multiple FileSets per object, and I totally agree with needing to have an Object that represents Parts, I think we're going down a questionable path as far as cardinality of aggregation and prescribed containment.

@elrayle
Copy link

elrayle commented Apr 25, 2016

Are you using postcards as an example use case for working through modeling issues? Or is hybox planning to support models for several well defined common use cases?

@barmintor
Copy link

I was borrowing the example context of postcards from Rob (not the literal example he used, just the pretend-concrete type of things), but:

  1. I'm under the impression that cultural heritage images are also a HyBox use case
  2. PCDM has a broad scope, as does hydra-works as the PCDM implementation of record for Hydra.

@mjgiarlo
Copy link
Member

@elrayle Our top two priorities for content types now are multi-file works (modeling a flexible need for "traditional" repository deposits, if you will) and photographs. See this blog post for more on that.

Postcards are a slightly more complex use case than photographs, so we were satisfying our modeling needs for photographs and simultaneously checking to see if that model could be built upon for more complex models.

@azaroth42
Copy link
Contributor Author

Re Postcards ... they're a good stand in for a "real" CH image based object. They have text, artwork, multiple sides and thus order, and so forth. You can easily extrapolate forwards to many ordered sides from two, or backwards from two to just one. And as the UCSB postcard example shows, it can be much more complex than that ... on the same order of magnitude as an atlas of maps.

Regarding batch as the "thing" ... I disagree that it's something the repository should care about at a core object level. If the batch is important then create an identity for it, and link the fileset to it... but the use of the fileset is associated with the object, not the workflow in which the image was created. That said, the existence of the FileSet being dependent on the existence of the object that the FS is related to is a valid point. We shouldn't presume that the structure will exist before the content does in the repository.

@barmintor
Copy link

Right: I don't want to reify batchness as a separate type of object; but I want to recognize that a single CH repository has core audiences that reckon object contexts for FileSets differently, and that this is expressed in the workflow- hence a need for reaggregation. This in some ways looks back longingly at AdministrativeSet, I know...

@elrayle
Copy link

elrayle commented Apr 26, 2016

My concern is in building certain assumptions into the model. I don't think you can make an assumption for the general case that all FileSets that are in the same PCDM:Object are different representations of the same 'object'.

How will Sufia enforce such an assumption for self-deposit users who may put a presentation, poster, and paper all in the same Hydra:Work?

@elrayle
Copy link

elrayle commented Apr 26, 2016

And making the restriction that a FileSet can live in one-and-only-one PCDM:Object means that the user who added the presentation, poster, and paper in the same Hydra:Work cannot then put just the presentation in another Work they are using to hold all their presentations.

@escowles
Copy link

escowles commented Apr 26, 2016

@elrayle , I think what's being proposed here is that Sufia would create a structure like this for a single uploaded file:

  • Presentation Work (pcdm:Object)
    • Presentation Part (pcdm:Object)
      • Presentation FileSet (pcdm:FileSet)
        • Presentation File (pcdm:File)

If you added another representation of the same presentation (e.g., an audio recording of it), you might add that to the same Part as a second FileSet:

  • Presentation Work (pcdm:Object)
    • Presentation Part (pcdm:Object)
      • Presentation FileSet (pcdm:FileSet)
        • Presentation File (pcdm:File)
      • Audio Recording FileSet (pcdm:FileSet)
        • Audio Recording File (pcdm:File)

I agree that supporting a files-first workflow means that FileSets should probably not be directly contained, and that opens up the possibility of them being members of multiple Objects.

If the user who uploaded the presentation and audio recording above wanted to add them to another work, I would expect the Sufia UI to let them select either the Part or the Work, but not the FileSets.

@elrayle
Copy link

elrayle commented Apr 26, 2016

@escowles Is Sufia going to automatically insert the Presentation Part (pcdm:Object) or is the user going to need to build this structure?

Here is my understanding of what users can currently do in Sufia...

  • New Work (user step)
  • Select one or more files for upload (user step)
  • Upload starts (automatically) and for each file
    • create a FileSet and add as FileSet (pcdm:hasMember) of new work
    • upload file creating a pcdm:File and add as master file (pcdm:hasFile) of FileSet
    • run derivatives creating pcdm:File(s) and add as file (pcdm:hasFile) of FileSet

After completion, there will be one Work with X members that are all FileSets (where X=number of files uploaded.)

Steps: (if automatic)

  • New Work (user step)
  • Select File for upload (user step)
  • Upload starts (automatically)
    • create presentation part (pcdm:Object) and add as member (pcdm:hasMember) of new work
    • create FileSet (pcdm:FileSet) and add as fileset (pcdm:hasFileSet) of presentation part
    • upload file creating a pcdm:File and add as master file (pcdm:hasMaster) of FileSet
    • run derivatives creating pcdm:File(s) and add as file (pcdm:hasFile) of FileSet

Steps: (if manual)

  • New Work (user step)
  • New Part (user step)
  • Select File for upload into the part (user step)
  • Upload starts (automatically)
    • create FileSet (pcdm:FileSet) and add as fileset (pcdm:hasFileSet) of presentation part
    • upload file creating a pcdm:File and add as master file (pcdm:hasMaster) of FileSet
    • run derivatives creating pcdm:File(s) and add as file (pcdm:hasFile) of FileSet
  • Repeat from New Part for each file (or select an existing part if same 'object')

Questions:

  • Is creation of a Part required? Or can a user upload a FileSet directly to a Work?
  • If manual, what prevents the user from creating a Part and uploading all files to that Part even if they aren't the same 'object'?
  • Can Parts be shared with another work? For example, if I have a Part for a presentation that has two FileSets (presentation ppt, presentation video), can I share that Part with My Presentations work that holds all my presentations?

@escowles
Copy link

@elrayle I would expect the Part to be required and created automatically (but @mjgiarlo can chime in on that).

I expect sharing FileSets or Parts between Works to be deferred until after Sufia 7.0, and probably require some use case building, UI work, etc. before implementation. The "work that holds all my presentations" (maybe a CV, bibliography, or something like that?) sounds like a good scenario to explore, in addition to the files-first batch workflow scenario.

@elrayle
Copy link

elrayle commented Apr 26, 2016

@escowles If automatic, what is the process for a user to add a second file to the part? E.g. user uploaded presentation ppt. Now how do they make the presentation video another FileSet in the same presentation ppt's Part?

@barmintor
Copy link

@escowles just to remind that Sufia is Sufia, and HyBox is 50% Sufia, but the changes being discussed here are in Hydra::Works and PCDM- that's what gets me jumpy. Sufia's use cases are more clearly delineated and predictable (I think). My expectation is that more generic collections apps will be operating on a superset of Sufia content and using Hydra::Works and PCDM to navigate all of it.

@escowles
Copy link

@barmintor, it's good to bring that up and be careful about these things. I'm mostly thinking about the Hydra::Works and CurationConcerns part of this. I have an incomplete notion of how Sufia will take advantage of these features, and an even fuzzier idea of HyBox will build on top of that.

@elrayle, I think the answer is that we would need to figure out how to add a FileSet to a Part, how the UI should work for that, what it's called, etc. I think the distinction between adding a Part to a Work, vs. adding a FileSet to a Part, vs. adding a File to an existing FileSet is going to be challenging to make clear. So the first task is to be clear on what they mean, and then decide which of those Sufia needs to support. It could be that only adding a Part to an existing Work is supported, so Sufia would need to be updated to create the intermediate Part object, but would otherwise not need to be changed.

@elrayle
Copy link

elrayle commented Apr 26, 2016

I would like to see all the proposed changes in an Application Profile. There are at least a few organizations who are building apps on PCDM now and these changes will have an impact. It will be necessary to have clear documentation of the proposed changes and behavioral assumptions, some of which are not enforceable by the code.

@mjgiarlo
Copy link
Member

@escowles 💬

@elrayle I would expect the Part to be required and created automatically (but @mjgiarlo can chime in on that).
Yes, I would too.

@mjgiarlo
Copy link
Member

mjgiarlo commented Apr 26, 2016

@barmintor raises a good point. This thread has (understandably) morphed a bit from "let's define a model Hydra-in-a-Box can use" to "let's discuss making changes to PCDM and hydra-works." I suggest we might want to have that latter discussion elsewhere, like the PCDM google group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants