data.json Schema changes #114

nightsh · 2020-04-14T08:04:32Z

Currently we are using a validation schema for the data harvested into the portal: https://project-open-data.cio.gov/v1.1/schema/#accessLevel

However, this schema does not support a number of metadata properties we need to have, such as:

collections
sources
level of data
dataset documentation
empty contactPoint.fn values

At data.json level, we can make adjustments to this behaviour so it would allow the needed properties. Since the files are generated by our own datajson transformers as part of CivicActions/edscrapers process flow, we can easily change the final transformation steps to reflect our needs.

Analysis

Two options for this:

1. Remove schema validation

This would have the flexibility benefit: anything we might need to add the the structure of the data.json file would just work without touching other parts of the flow.

The caveat is, of course, that we would miss validation, thus increasing the risk of introducing bad data and trusting the datajson transformer to make the final calls.

2. Fork the schema to add the missing features

Best of both worlds: continue having validation, but bend the rules so we can accomodate the properties we want, the way we want them.

We will have to copy the source schema and host the modified copy, then use it as part of the generated datajson files.

Recommendation:

use option 2 i.e. an altered version of the schema, altering its structure to match our data specs.

Based on this recommendation, specs for this are provided here

Tasks:

implement the new specs in a fork of the currently used validation schema
host the new file in a public location
adjust the datajson transformer to use the new schema
test by adding previously unsupported properties

Acceptance criteria:

having any of the specs in the final data.json output doesn't break the harvesting process
we have the defined & implemented specs visible in the portal

The text was updated successfully, but these errors were encountered:

nightsh mentioned this issue Apr 14, 2020

Scrape and harvest collections / sources - PROTOTYPE [P1(OCR)] #115

Closed

8 tasks

osahon-okungbowa assigned nightsh May 1, 2020

osahon-okungbowa changed the title ~~[stub] data.json Schema changes~~ data.json Schema changes May 1, 2020

This was referenced May 4, 2020

Scrape and harvest collections / sources P10 (FSA) #129

Closed

Scrape and harvest collections / sources P2 (OCTAE) #131

Closed

This was referenced May 13, 2020

Scrape and harvest collections / sources P3 (OPE) #134

Closed

Scrape and harvest collections / sources P4 (OELA) #137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data.json Schema changes #114

data.json Schema changes #114

nightsh commented Apr 14, 2020 •

edited by osahon-okungbowa

Loading

data.json Schema changes #114

data.json Schema changes #114

Comments

nightsh commented Apr 14, 2020 • edited by osahon-okungbowa Loading

Analysis

Recommendation:

nightsh commented Apr 14, 2020 •

edited by osahon-okungbowa

Loading