-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of different data formats #24
Comments
Also has to go into #10 |
We now have the specifications for the template fixed within the technical notes. To make progress on this we could use these notes and translate them into a json schema. The advantages of json schema to the markdown format:
AlternativesI currently do not see any good alternatives. Possible options would be: xlsx
markdownnot machine readable and hence not able to serve as technical basis for validations. uml
direct technical implementationsKeeping them compatible requires a common reference / language. This should be digestable across technical implementations. Hence, jsonschema. I would propose this as first step to see where we can go with this. Comments @Jo-Schie , @Maja4Dev or @goergen95 before I start simple first attempts in this direction ? |
Hi Fred. I think this is an awesome idea. From what you listed above I also think that json or even geojson could be a good approach. I will try to ask the people from the Geonode project also for their opinion. Will report back as soon as I have an answer. |
Hi @fretchen, totally agree with your summary here. Please note, that translating the specification to JSON Schema can only be a first step. To make something useful with that, I think it would require to also provide some tooling for conversion and validation. For a first step, conversion could go e.g. from Excel/CSV -> JSON/GeoJSON which could then be validated. There is also recent fiboa project providing prior-art of designing an extensible and modularized data specification, including geospatial information, based on JSON Schema. |
Also, I don't think the title of this issue applies if we are now targeting JSON Schema. JSON Schema specifies how data should look like, it is not data itself. The specification is not the same as an implementation. And, as I said in other comments, I do not think we want to force third parties into concrete implementations. Instead, we should aim to offer a specification, tooling for some conversions, and, most importantly, validation. |
Sounds good to me and I have nothing to add. When I find time to create a json schema, I would open a separate issue / PR such that we can work us through the todo list. The way I see the todo list right now is:
|
@goergen95 made the following argument for YAML in #76
I personally support the idea in general. However, there are a few technical points to be respected, which might actually create quite some work: Automatic conversion from YAML to json schema: I personally have not easily found a tool to directly do this. Introduction of yet another file format: From what I have seen YAML cannot completely make json schemas obsolete. So we will typically have them in our tool-chain independently of YAML in or not. So we really have to see even the ease of working with YAML outweighs the need of extra conversion tools. If I am right in my understanding we might consider these points after we have decided to merge #76 . |
In Python, both file formats are internally represented as dictionaries. So you can easily use JSON Schema vocabulary in YAML and validated with Here is a small example (incoming data does not have to be in YAML, but needs to be converted to a dictionary): import yaml
import json
import urllib.request as req
from jsonschema.validators import Draft202012Validator
schema_yaml = "schema.yaml"
data_yaml = "data.yaml"
req.urlretrieve("https://raw.githubusercontent.com/mapme-initiative/mapme.pipelines/main/inst/schema.yaml", schema_yaml)
req.urlretrieve("https://raw.githubusercontent.com/mapme-initiative/mapme.pipelines/main/inst/config-example.yaml", data_yaml)
with open(schema_yaml, 'r') as yaml_file:
schema = yaml.safe_load(yaml_file)
with open(data_yaml, 'r') as yaml_file:
data = yaml.safe_load(yaml_file)
validator = Draft202012Validator(schema)
validator.validate(data)
with open("schema.json", "w") as json_file:
json.dump(schema, json_file, indent=4) |
A number of different possible data formats exist and ideally we should find a way to streamline them. One issue that was raised with the xlsx format in #21 was by @goergen95
We have started to look into this in #17 and we have first tools for conversion in #18 . However, it is not yet clarified how we can put all of the ideas together...
The text was updated successfully, but these errors were encountered: