Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

difference between a work's children (fileset vs. another work) should be more transparent #2432

Closed
hackartisan opened this issue Aug 9, 2016 · 30 comments

Comments

@hackartisan
Copy link
Contributor

I'd like to argue that whether a work's child is a fileset or another work is a distinction that hinders usability.

E.g.
As a rare books curator I want to model pages full of text as fileset but plates or engravings as works so that i can give them more metadata. I'd then like to combine all of these pages on a single 'book' work and order them as needed, intermingled. As a researcher using this book I'd like all the pages to be in a single ordered list.

Currently contained works will be in the 'relationships' area but contained files will be in the 'items' area.

If we loosen the distinction in the UI we would be able to include drag-and-drop ordering in the current 'files' tab of the edit form, meaning depositors wouldn't have to click through to a separate screen to do ordering.

Related work

See screenshots on #2243

@mjgiarlo
Copy link
Member

mjgiarlo commented Aug 9, 2016

This issue touches on UI/UX and modeling, so since this issue is tagged with needs feedback, I'm tagging some folks to chime in:

@hackartisan
Copy link
Contributor Author

Heh, I think I'm saying I'd like to differentiate less, i.e. lump them together more.

@mjgiarlo
Copy link
Member

mjgiarlo commented Aug 9, 2016

Ha. I removed my editorial remark so as not to confuse the issue, then. Sorry!

@mjgiarlo mjgiarlo added this to the 7.2.0 milestone Aug 9, 2016
@newmanld
Copy link

newmanld commented Aug 9, 2016

I’d like to read what the modeling-interested folks have to say, but I agree with @HackMasterA that the distinction hinders usability. @HackMasterA gives a relevant use case where an intermingled list of files and works would be desirable, and where the contained works should have the same visibility as the contained files – not be in a Relationships area that may seem secondary. I think users would find it intuitive that Items listed everything contained by the work.

The screen shots that are shown in #2243 seem to demonstrate that this could affect not only the listings of child works but of parent works. The Relationships area gives us a place to try to explain what is ‘parent’ and what is ‘child’, or ‘In Works’ and ‘Has child works’.
To loosen the distinction and provide the functionality that @HackMasterA suggests, could we
a) list child works (not parent works) in the Items area intermingled with files, while also maintaining the listing of parent and child works in the Relationships area?
Or
b) List child works in the Items area intermingled with files, and list parent works in the Relationships area?

I lean toward the second option, reserving the Relationships area for ‘In works’ and ‘In Collections’. The Items list could possibly have a designator for Work if we find that the difference between a Work and a File needs to be evident in that list, but I’m not sure that it does.

Different issue, but do we support a peer relationship, a simple Related Works between two works (not parent or child)? This could be in the Relationships area if we are supporting it.

@cmharlow
Copy link

Apologies if I cover something already discussed or misunderstand something the Sufia 7 community (or others) has decided. I'm recapping my limited understanding.

From the original question:

"As a rare books curator I want to model pages full of text as fileset but plates or engravings as works so that i can give them more metadata. I'd then like to combine all of these pages on a single 'book' work and order them as needed, intermingled."

So in this example, one predicates how to model pages (Work or Fileset) based on if pages are 'full of text' (so become Filesets) or if the pages have 'plates or engravings' (i.e. images, so become Works)...?

The way I understand this is that pages (and other parts, when differentiated) are "Works" regardless of primary content type (text or images or other). Any Work (book, page, image, other) can then have member Filesets, and a Fileset contains Files/non-RDF resources (text, image, whatever) that are that Work's digital surrogates from particular digitization efforts or upload activity. This is coming from my very small lurking involvement in Hybox conversations (as well as someone working with internally-managed/curated digital collections, not self-deposit or heavily faculty-curated items). See here for more discussion by better modelers than myself on this topic: hybox/models#17 (comment)

If that's so (each page has a Work), then you'd always be ordering 'Works'. Re-arranging Items/Files/Filesets would perhaps (maybe?) be more about which digitization subset + from that subset, which file (the png versus jpeg, this resolution or that, etc.) you choose as a preference. So, for Anna's use case, there'd always be a 'Work' abstraction in the relationships area to order. You wouldn't have only a Fileset here for one page, and a Work there for another page - not if you wanted to perform page ordering. (I'm still thinking through the UI ease of loading a bunch of files to one "work", the user not making a bunch of new works, then requiring ordering.)

The sometimes scare quotes on 'Work' comes from the lack of distinction landed on in this discussion re:Work / Part / Object / Fileset : hybox/models#42 (comment) When I say 'Work', I mean a non-Fileset Object that cannot hasFile pcdm:File.

I'm sorry though if I entirely missed the point(s).

@hackartisan
Copy link
Contributor Author

hackartisan commented Aug 10, 2016

@cmh2166 thanks so much for this response.

I think your approach here is from a pcdm2 perspective, of which i have only passing understanding. I understand that filesets will become more formalized. Currently, though, an organization could definitely choose to do it the way I've suggested, and there is incentive to do so:

  • Catalogers don't need to be presented with a bunch of metadata fields to upload a page
  • Catalogers can upload it right onto the work without having to create a new work and link them up
  • I don't have to generate a new curation concern for 'page' just to create a stripped-down form.

Will this way of modeling a book or other work as having children that are a mixture of works and filesets become disallowed in some way?

Or, just as importantly, is there community agreement that this is for some reason not a good way to model, even in pcdm1 which is what we currently have? If so, why not? Besides the UI limitations :)

@hackartisan
Copy link
Contributor Author

BTW this is on today's hydra tech call agenda please do join!

@escowles
Copy link
Contributor

I want to second @cmh2166's comment above and say that is my understanding of the outcome of the modeling discussions over the last several months, and characterized as the way that Islandora handles compound objects.

But I also take @HackMasterA's point that this is what PCDM 2.0 is all about, and there are real issues in the current (PCDM 1.0) implementation that need to be worked out. I think it's fine to model some pages as FileSets and others as child Works, and that the distinction would revolve around whether the page was seen as useful outside the context of the parent Work. If the drag-and-drop ordering is brought in, then the user could decide what order they went in, including putting all the child Works at the beginning and all the child FileSets at the end if that was the best way to view them.

@hackartisan
Copy link
Contributor Author

@cmh2166, @escowles, please let me know where to find best practices for modeling books and other works if such things have been agreed upon!

@hackartisan
Copy link
Contributor Author

hackartisan commented Aug 10, 2016

the distinction would revolve around whether the page was seen as useful outside the context of the parent Work.

Yes, that's another thing i wanted to say; the idea of a 'page' as a 'work' is conceptually difficult to accept. @cmh2166, @escowles

@escowles
Copy link
Contributor

@HackMasterA I agree, but I think it's useful to say that all pages/parts/components should be pcdm:Objects, since they might be viewed as a Work in another context (maps/plates are the typical use case). So the idea is that the typical page is a trivial Object, but they can be enriched at any time if it's useful to do so, without having to remodel them. That also makes the ordering question easier, because all the children are Objects, so the Object/FileSet ordering problem never comes up.

@hackartisan
Copy link
Contributor Author

@escowles i hear that. but i don't love the idea of front loading a lot of work for my catalogers 'just in case'. making every page a Work is way harder for them than making them filesets. And the work I would need to do to make it easier for them is not on the horizon for me, time-wise.

@hackartisan
Copy link
Contributor Author

isn't a sufia fileset already a pcdm:object?

@simosacchi
Copy link

This is a very interesting and useful discussion. I think that at the root of the issue lies the 'overuse' of the work class and the entire work extension of pcdm. Conceptually, I think, a 'work' is a specialized pcdm:object carrying a specific intellectual identity (i.e. something for which we can possibly identify independent intellectual responsibility and unity). I have to agree that a general page should not be considered a work on itself (this is not to say that an picture ON a page is not a work, it is!) so we shall probably roll back and consider either the use of a less specified pcdm:object or come up with a modeling construct for parts (I highly advice against the latter) @escowles is perfectly right in his last comment. So you would have a work entity at the book level, possibly work entities at chapter level, and pcdm:objects at the page level (as an example) than each page would be related to one or more fileset depending on the specific context.

@escowles
Copy link
Contributor

@HackMasterA yes, the PCDM 1.0 FileSets are also pcdm:Objects. So the PCDM 2.0 proposal is basically to split that in half and have the Object part represent the (possibly minimal) aspect of the page and the FileSet part represent the bundle-of-files aspect of the page.

And I completely understand why you wouldn't want to create child Works for every page now. It would be a ton of work, for little benefit. With PCDM 2.0, then you would just upload a file like you do for a FileSet now, but Sufia would always create a child Work and attach a FileSet to it.

@simosacchi
Copy link

Also, attach filesets directotly to the work for a digitized book and mix them at the same level with other works or pcdm: objects at the page level would make the modeling strategy less consistent and would not allow, for example, to claim that the same page have been digitized twice (two masters) still being the same page.

@newmanld
Copy link

I agree that it is an interesting discussion. But have we strayed from the original thrust of @HackMasterA, which to me was the lack of usability of separating filesets and child works in two lists in Sufia, when both are contained by a work? Even with best practices agreed upon, and clarity in the online help, and modeling distinctions maintained, in regard to what should be a work and what should be a file, wouldn’t there be a usability advantage to @HackMasterA’s original point about intermingling? As a user, do I want to see all objects contained by a work in one list?

@escowles
Copy link
Contributor

I am +1 to modeling some pages and child Works and others as child FileSets, and showing them in a single list. Bringing over the CC ordering functionality would help make that more usable and support mixing them or keeping them separate (and of course each app can override this based on their local needs).

@simosacchi
Copy link

@escowles @HackMasterA I see the practical difficulties of the modeling overhead in adding intermediate entities, but I still believe that fileset should keep their functionality and not be "directly" used to model "parts". I think the usability issue can be solved at the application and UI level when the creation of the intermediary pcdm:object is done in the background when multiple parts of a work are uploaded (I know this is not there yet, or at least not entirely).

@barmintor
Copy link
Contributor

It might be worth noting that even in the page-is-a-fileset scenario,
there's probably a phantom work in the form of a proxy, which from this
perspective is a UI for addressing the overhead of a very light single
member work.

On Aug 10, 2016 9:55 AM, "Esmé Cowles" [email protected] wrote:

I am +1 to modeling some pages and child Works and others as child
FileSets, and showing them in a single list. Bringing over the CC ordering
functionality would help make that more usable and support mixing them or
keeping them separate (and of course each app can override this based on
their local needs).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2432 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHUNCGyS6Uce8-Gdxpb7Tqdbew86sHPks5qedhNgaJpZM4Jga0e
.

@cmharlow
Copy link

Sorry, breaking a few things out to respond to as separate comments. Here are things that we can maybe follow up on elsewhere (with links to elsewhere):

1: Parts As Works

Yes, that's another thing i wanted to say; the idea of a 'page' as a 'work' is conceptually difficult to accept. - @HackMasterA

I agree, but I think it's useful to say that all pages/parts/components should be pcdm:Objects, since they might be viewed as a Work in another context (maps/plates are the typical use case). - @escowles

I think that at the root of the issue lies the 'overuse' of the work class and the entire work extension of pcdm. Conceptually, I think, a 'work' is a specialized pcdm:object carrying a specific intellectual identity... - @simosacchi

Yes, this dilutes the idea of a "Work" (something already possibly ambiguous regardless). See my original comment on this, which has a link to some really good/pertinent discussions on the topic:

The sometimes scare quotes on 'Work' comes from the lack of distinction landed on in this discussion re:Work / Part / Object / Fileset : hybox/models#42 (comment) When I say 'Work', I mean a non-Fileset Object that cannot hasFile pcdm:File.

And also see @azaroth42's email to PCDM listserv about this from a few months ago that came from those discussions: https://groups.google.com/forum/#!topic/pcdm/qymzKAv0uoA

However, for the sake of this discussion, I'm thinking of "Works" in the its an Object that isn't Fileset sense. I'll use just generic PCDM:Object from now on (with explicit "that is not a PCDM:Fileset" where needed). That Works being diluted issue can be perhaps discussed further on the PCDM listserv (or even Rob's thread, which got no responses).

2: PCDM 1.0 versus 2.0

I think your approach here is from a pcdm2 perspective, of which i have only passing understanding. I understand that filesets will become more formalized. - @HackMasterA

The more formalized Filesets definition may fall to PCDM 1 versus PCDM 2, but I'd be a bit hesitant to make that claim (seems more about which ingest path/gem/version you use...?). My understanding was "PCDM 2" is where we're trying to pull some of the Hydra-Works/PCDM-Works models (as is? with improvements?) into PCDM, and let Hydra-Works/PCDM-Works become not about defining new classes, but more about application profile/setting behaviours.

There is a really good discussion about pinning down Filesets definitions going on here: https://groups.google.com/forum/#!topic/pcdm/8xVAWuczaxQ And this work would probably be part of a PCDM 2 version, if only because it's happening now on the listserv.

Discussions of PCDM 1 versus 2 should probably go to the PCDM listserv as well, especially if we are thinking these distinct enough to cause compatibility issues. For this discussion, I'll try to stick with "PCDM 1" core and Hydra-Works/PCDM-Works understandings (in which, we're still generating those intermediate resources at MPOW).

@cmharlow
Copy link

Do we want to model some Parts as Filesets and other Parts as "Works"/Objects that aren't Filesets and don't contain/have members Files...

mmm. I'm -1 to that, sorry. This conflates models and makes metadata profiling/resource management that much more complicated. I agree with @simosacchi re: keeping Filesets as where you manage non-RDF resources, and I think @barmintor makes a good point that once you bring in ordering, you're making these "intermediate" resources (here, proxies) regardless. I like @escowles comment:

So the idea is that the typical page is a trivial Object, but they can be enriched at any time if it's useful to do so, without having to remodel them.

So, in my limited and likely to change opinion, I'd rather aim for generation of all those Parts as "Works"/Objects that aren't Filesets when you're ingesting a digital object/collection that you know may require those distinctions (for ordering, additional metadata, etc.) - i.e. the CC functionalities @escowles mentioned. I'm a bit curious when you'd have Catalogers manually creating all those Parts too (would you have a Cataloger manually create a Book and all its relevant Pages in Sufia, versus finding a way to batch load?) Does this also indicate maybe a need for some batch ingest / basic metadata generation tooling around Parts? Just thinking out loud.

If for functionality sake, an institution doesn't want to support PCDM:Objects that aren't Filesets for all Parts, then I'd rather the institution makes all the Parts (not just some) Filesets + update the metadata profile for those Filesets to be consistently available for expanded descriptive + other metadata (and then can consistently update UI for other options). This is me speaking firmly with my metadata munger/migration lackey hat on though, and I'm still thinking this through.

Sorry for the long messages. Thanks for the discussion, and hope this helps.

@hackartisan
Copy link
Contributor Author

hackartisan commented Aug 10, 2016

Thank you for all the links, @cmh2166, I really appreciate that.

If for functionality sake, an institution doesn't want to support PCDM:Objects that aren't Filesets for all Parts, then I'd rather the institution makes all the Parts (not just some) Filesets + update the metadata profile for those Filesets to be consistently available for expanded descriptive + other metadata (and then can consistently update UI for other options). This is me speaking firmly with my metadata munger/migration lackey hat on though, and I'm still thinking this through.

Please say more about why you think this is easier? It seems harder to me. If at some point I'm going to migrate to some Works + some PCDM:Objects I would think it would be easier to move from some Works + some FileSets than from all Works (which I would have to sift through somehow).

@hackartisan
Copy link
Contributor Author

hackartisan commented Aug 10, 2016

@cmh2166 So I should add that when I say 'Work' and 'FileSet' I mean the objects in Curation Concerns / Sufia . It's therefore definitely helpful to use pcdm:object instead of 'work' when talking about a non-Work non-FileSet object!

@hackartisan
Copy link
Contributor Author

Copying from private conversation with @cmh2166 with permission:

from an implementation viewpoint, I’d be careful about making some Parts pcdm:Objects (“Works” or not) and some Parts pcdm-works:Filesets, if only from a data management viewpoint down the line. not that you couldn’t, but then what metadata/data profiles get attached to which + how can you differentiate later on for batch processes, etc? So, lets say its a book with maps and text. text you just make filesets. maps you make child works. I want to pull all pages for batch update with something related to the book (add or update page numbers), I’ve got to be aware of what “level” the data is at. not impossible, maybe not even hard, but something to consider

@simosacchi
Copy link

@HackMasterA : I agree completely with @Cam156 perspective on the modeling issues that might come up when modeling entities "at the same level" differently. That speaks directly to my comment above about not abusing the fileset construct, and plan ahead to reduce lack of flexibility moving forward (see the example of multiple digitization masters of the "SAME" page).

Unless people see issues of performance by adding layers of indirection, my suggestion is still to model all pages as pcdm:objects and not filesets. I hope there is a way to make the process automatic/invisible for cataloguers at the application level, and with PCDM2.0 in Sufia, according to @escowles it seems the logic is already moving in that direction.

@hackartisan
Copy link
Contributor Author

@simosacchi yes, I appreciated that use case from @cmh2166 (not @Cam156 !). It's definitely food for thought but it still seems like my best option today may be to use Works and FileSets with an eye to migrating when pcdm2 and related infrastructure are available. It seems like it will be easier to move all filesets into new 'object' containers than to sift through Works and figure out what would be made more generic.

@hackartisan
Copy link
Contributor Author

Either way i think we have to allow for the possibility that today there may be a work with both types of children. given that I'm hearing support for integrating their placement in the UI.

@simosacchi
Copy link

@HackMasterA : Ops, sorry @cmh2166 I was tricked by the autocompletion... I totally see use cases where you have both types of childrens, one example that would not violate the assumptions above: you have both pcdm:objects for pages (and a fileset attached to each of them with TIFs and derivatives) and also have a fileset with a PDF file for the entire work (i.e. the book) directly attached to the parent object (regardless of it being a work or "just" a pcdm:object).

@hackartisan
Copy link
Contributor Author

thanks to everyone for this very illuminating discussion. it seems to have petered out so I move to close the issue. I've created #2459 to reflect the consensus we were able to reach, ui-wise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants