Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find better names for entities in the publishing/versioning model #321

Open
RickMoynihan opened this issue Jan 10, 2024 · 1 comment
Open

Comments

@RickMoynihan
Copy link
Member

RickMoynihan commented Jan 10, 2024

We currently have

(catalogue) -> [dataset series] -> [release] -> [revision] -> [commit]

The entities inside [ ] are the ones in scope for renaming.

Release/Revision are a little too close in terminological terms and it doesn't help that they both begin with Re.

Dataset Series / Series - may also be a little unfamiliar.

Also we may wish to consider a new name for commit; to perhaps avoid confusion with git terminology. In particular in git a commit identifies the state of the whole repository, i.e. it references everything prior to it. For us it doesn't, for us a commit is more like a delta with metadata/message; and the revision identifies the state of the whole repository.

Some options (unchanged entities are in ()):

  1. (catalogue) -> [dataset series] -> [release] -> [edition] -> (commit)

Like what we have at the minute except we rename revision to edition.

2.1 (catalogue) -> [publication] -> [dataset] -> [edition] -> (commit)
2.2 (catalogue) -> [publication] -> [dataset] -> [edition] -> [delta]

Make publiction be the catalogued entry; dataset be the stable release with schema, and edition is a precise version of that with ammendments.

3.1 (catalogue) -> [dataset] -> [publication] -> [revision] -> (commit)
3.2 (catalogue) -> [dataset] -> [publication] -> [edition] -> (commit)

An inversion of 2, make dataset be the catalog entry, and publication be the packaged/stable release of it with schema/methodology. It then has a revision or edition for locking any ammendments, and they give you access to the commit/delta log.

Discussion

Of the above I think there are arguments for 2 and 3.2. Initially I leaned towards 3.2 as my preference because we've always thought of a publication as a collection of related resources; and that they should be grouped with the stable package... i.e. the methodology/schema should be stable within a publication.

However I think upon reflection 2.2 might be the better choice; there is no reason a dataset can't have supporting methodology/schema. A publication could represent the series, the only thing I dislike is that publication sounds like an artifact not a series... arguably dataset has this problem too.

@ricroberts
Copy link
Contributor

ricroberts commented Jan 10, 2024

i think i prefer 2.1 or 2.2.

Happy to think of an alternative name for publication (but dunno what)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants