-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add versioning guidelines for products in Translator #21
base: master
Are you sure you want to change the base?
Conversation
@cbizon and @cmungall (cc: @deepakunni3 )
I wonder whether allowing Minor releases of the Biolink Model to be incompatible is such a good idea. I suspect not. It is actually hard to imagine what a "Major" release for the Biolink Model would mean except some radically structurally different structure to the syntactic structure (would modularization qualify?) although if the semantics remains intact, even such remodelling of the YAML may be relatively invisible. On the other hand, model changes which are semantically backwards incompatible could have significant consequences in data processing. Perhaps we should remove this data model versioning exception (i.e. rather assert that MINOR releases must be backwards compatible, i.e. likely just involve additions to the model) and allow ourselves "MAJOR" releases for breaking changes, OR perhaps we can introduce an additional level of release numbering, e.g. MAJOR, INTERIM, MINOR, PATCH where "INTERIM" are releases which break the Semantic Model in a "minor" way but are not MAJOR changes to the model, and MINOR releases are backwards compatible, i.e. mainly additions of new types, classes, slots, mappings, etc., while PATCH fix any mistakes in the MINOR releases. Which such a system, release 1.4.0.0 would be the new 1.4.0 since the new partitioned mappings, and some other changes, were 'breaking' changes to the Semantic use of the model, but since that time, many PR's have resulted in model additions or (in some cases) model replacements with explicit Deprecation of older elements, but keeping them around until the next non-MINOR release. A four level numbering scheme would allow us rapid additions to the model (as are being now in Translator) but ensure that software systems and knowledge graphs using a MAJOR.INTERIM level of compatibility need not do anything to accommodate the latest models at their leisure. If we are truly reluctant to add the concept of an INTERIM release, then here is my question: how often we'll have a MAJOR release of the Biolink Model? If "almost never" then why not simply willing to increment the MAJOR release as often as necessary for models lacking full backwards incompatibility? In that sense, the Biolink Model release 1.4 ought to have been called the 2.0 release, unless we had opted to keep the removed/renamed old elements deprecated in the model, e.g. the old "mappings" slots in all of the (predicate) slots and classes, which we didn't and had we, would have probably really confused people! |
@patrickkwang, @vdancik, (cc: @deepakunni3, @cbizon, @mbrush) please see my comments above about Data Modelling versioning. Specifications versioning (e.g. for TRAPI) is probably a less frequent event and the "compliance" community currently small, so perhaps the issue of backwards incompatibility in MINOR versions is not a great issue (e.g. TRAPI 1.1 is slightly backwards incompatible with TRAPI 1.0, right?). Nonetheless, I wonder if any of my thoughts about Data Modelling versioning also apply to the Specification (and Metadata?) sphere as well(?). |
@mellybelly, I'm going to drag you into this conversation to comment about Versioning of Data Products. Essentially, should MINOR releases which have removed data, really be considered "MINOR" in the SemVer sense, in that computations will change based on the absence of such missing data, thus the results won't somehow be "backwards compatible". Should MINOR releases only Add new data, not remove or relabel any. Here too (read above) would it be useful to introduce a new "INTERIM" release level in between MAJOR and MINOR levels, which allow for deletion and some relabelling of data (hence losses of some older results are a given) while leaving MINOR only for (meta)data additions? PATCH releases could always be allowed to fix errors in the MINOR releases. Not sure exactly what "maintenance" releases mean in the Data versioning context, but I sense that the above four level distinctions make the issue moot (could remove that exceptional rule from the versioning policy). |
I think it's important to understand that semantic versioning was not made for things like the Biolink model and TRAPI. It was made for services - libraries, languages, public web APIs, etc. - rather than specifications like the Biolink model, TRAPI, or the OpenAPI specification itself. "Backwards compatibility" does not exist in the same way for the second class of things as it does for the first. When a service undergoes a minor version change, the client can continue treating it the same way as before without expecting a change in behavior. A specification, though, has no "behavior" and hence no sense of "compatibility". We can still apply version numbers to specifications, they are just not "semantic" versions. Things get fuzzy because we can have a service that implements TRAPI, and the service as a whole can be versioned in a semantic way. It's also confusing that TRAPI looks like a service and we haven't really clarified what it means as a specification - and OpenAPI schema is meant to describe what is, not what is allowed to be. |
Thanks @patrickkwang for your keen observations. I guess I've got a more simplistic view of things, of the following four levels (as noted above, rephrased again):
For Specifications, if everyone in the target community commits to upgrade at the same time (e.g. Translator KPs and ARSs updating to TRAPI 1.0, which they all collectively revised and voted for acceptance together), then version management is pretty trivial. You guys can call the shots there. I do think, though, that we need to do a double take on the Biolink Model, since we don't really have any of the level 1 releases (not sure what that would represent), but we've had several level 2 (model breaking changes), and frequent level 3 version changes (new slots, classes and mappings added), plus a few small level 3 "PATCH" bugs fixed. I also suspect that DATA revisions have a similar dynamic, although one often simply sees time stamped data releases with simple numbering (eg. "release ##" dated 13-Jan-2021). I've pinged Melissa to comment on that one. Anyhow, I gave my 2 cents about this and will simply now abide by the collective wisdom of the Translator Mob.. er.. Community, LOL. Anyhow, I'll leave it at that. |
This PR adds versioning guidelines as documented here: https://docs.google.com/document/d/1Ds_UREeepaSrO64IRuBLmLWgIPrdkpmwfv3xeRE4PjE/edit#
Henceforth, all votes, discussions, comments, and feedback should happen on this PR.