-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid using DOI URLs for node identifiers #197
Comments
Thanks @datadavev, seems challenging. So, this seems like a side-effect of overloading content negotiation to perform multiple roles in different contexts (e.g., for crossref as a way to trigger a metadata request API about an item rather than redirecting to the item itself). But the proposal to not use DOIs as the node identifiers seems problematic if that DOI is meant to be the long-term resolvable URI for the dataset. If the DOI is the only stable URI for a dataset, and represents the preferred identifier for the dataset, what should the node identifier be set to? |
The DOI resolver service (actually any identifier resolver) should respond with a resolver service metadata content representation only when specifically requested. RFC 8288 provides a mechanism for advertising the availability of related resources, including alternate representations of a resource. The resolver can advertise an alternate representation of a resource by including such information in the HTTP response Link header. At a minimum such information should be included when a resolver returns an alternate representation of a resource not specifically presented by the resource authority. Ideally, the DOI resolver should only respond with information about the location of the requested resource. So the resolver returns the original representation by default and advertises availability of an alternate representation through a link header in the response. For example the DOI resolver could respond similarly to the following example, with the redirect response including a link to the location of the resolver metadata (fake example):
Note however that link header handling in redirect responses would only be available to programmatic clients. Intermediate communication details are not exposed to web browser based clients. DataCite is aware of the issue, but a change in their resolving behavior may have other side effects that need to be considered. |
Perhaps the problem is interpretation of what the DOI identifies. If the DOI identifies a dataset, then resolving the DOI should get a representation of the dataset-- e.g. a CSV, NetCDF, ESRI shapefile, i.e. some serialization of the dataset. We have accepted the notion that a landing page is a representation of a dataset. The node identifier in JSON-LD identifies the node-- i.e. a JSON object. That JSON object might be about a dataset, in which case it is functionally analogous to a landing page, but in this case I'd argue that the node identifier identifies a particular representation (the JSON object) that is about the thing the DOI identifies. From that point of view the simple solution is to use a different identifier for the node. |
The The problem is that when a client encounters a DOI for a node identifier and the guidelines are followed for dereferencing the node identifier, the resulting document is an unexpected alternative representation from CrossRef (at best) or an error condition. It is not the resource offered by the resource owner. This breaks the linked data expectations. Furthermore, it seems there's no way around this for json-ld resources other than to make the request in a manner inconsistent with the json-ld spec. Note that requesting a different RDF serialization (e.g. To me at least, this behavior is problematic since the resolution service is subverting the resolution request and returning a resource that was not requested. The resolver should present a different rendering of the resource only when specifically requested (such as through a different API or through specific request parameters). The solution is fairly straight forward, but it seems it does need to be implemented by DataCite. The alternative is for json-ld clients to implement custom behavior when dereferencing a DOI. Footnotes |
The JSON-LD 1.1 Processing Algorithms and API specification 1 provides guidance on retrieval of JSON-LD over HTTP in the section Remote Document and Context Retrieval:
If a resolution of a DOI is required, for example
"@id":"https://doi.org/10.5066/F7VX0DMQ"
, then the resolved resource may not be the authoritative source. Requesting that resource as a web browser with a content priority of HTML results in the following resolution sequence (>
indicates a request,<
the response):(request made with
Accept: application/ld+json;q=0.7,application/json;q=0.6,text/html;q=0.9
)If instead the content-type
application/ld+json
is preferred, a different resource is resolved (with a failure in this case):(request made with
Accept: application/ld+json;q=0.7,application/json;q=0.6,text/html;q=0.2
)This behavior may lead to unintended consequences. Hence, until this behavior is corrected by DOI resolvers it seems prudent to avoid using DOIs for node identifiers in linked data systems reliant upon reliable resolution of JSON-LD resources.
[Edit: added Accept request header values]
Footnotes
https://www.w3.org/TR/json-ld11-api/ ↩
The text was updated successfully, but these errors were encountered: