Provenance of data accessed through an API which may require authentication #377
Unanswered
thomaszwagerman
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Taking note of guidance on downloadable datasets.
I'm wondering if there are any best practice examples on how to represent data provenance when data is accessed through an API endpoint (which may require authentication through some sort of key like a Personal Access Token).
My use case is that I would like to represent a pipeline which programmatically accesses data from the Copernicus Climate Data Store (for example ERA5 single levels), performs some calculation and creates a new data output. This "new" data output could be ingested into another environmental forecasting model (or Digital Twin system), so an ROCrate is potentially a good way to pass on provenance in such an interaction. Ideally a description of this captures the endpoint used, query used (i.e. geographical area, timespan), license, date accessed, DOI, and the authentication without actually attaching credentials.
I'm new to ROCrate, so my interpretation of how it should work might be wrong, but I'm considering the following approach:
I interpret a script that accesses an API as software used to create files, and describe a script and configuration file (which includes information on the query) as a createAction instruments where the
object
is a detached Dataset that contains information on the endpoint andresult
the data downloaded.Where `"data/" is described as downloaded:
and
"https://cds.climate.copernicus.eu/api"
is described as a detached dataset (yet to crack this one really, but should contain license etc). After this there are further scripts which process"/data"
.I guess my main questions are:
Beta Was this translation helpful? Give feedback.
All reactions