Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation logic to FeatureRDD #118

Open
echeipesh opened this issue Oct 12, 2021 · 0 comments
Open

Add validation logic to FeatureRDD #118

echeipesh opened this issue Oct 12, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@echeipesh
Copy link
Collaborator

echeipesh commented Oct 12, 2021

Is your feature request related to a problem? Please describe.
Reading features from the TSV file may fail.
These errors should be wrapped in Validated such that they can be propagated to the job output as an error in location record instead of failing the whole job.

Examples:

  • WKB can be invalid, bad user input
  • WKB can be valid but encodes invalid geometry
  • Pre-Processing of features (like splitting along the grid tiles) triggers some kind of JTS TopologyException
  • ???

Describe the solution you'd like
ErrorSummaryRDD introduced the use of Validated for the polygonal summary operation with mechanism to report the errors. We should have ValidatedFeatureRDD to cover feature input.

This should be a new class so we can test it in ForestChangeDiagnostic and dashboard jobs without having to change the rest of the code-base. Later refactors can clean that up.

Because part of the logic here is covered by DataFrame API the logic from SummaryRDD will not be enough.
I'm not sure what the best way to handle the exceptions in DataFrame nor actually what they will look like. (they SHOULD result in null fields without much explanation, but that may not be the case).

Either way I would expect to see some use of Validated here:

runAnalysis { implicit spark =>
val featureRDD = FeatureRDD(default.featureUris, default.featureType, featureFilter, splitFeatures = true, spark)
val fireAlertRDD = FireAlertRDD(spark, fireAlert.alertType, fireAlert.alertSource, FeatureFilter.empty)
ForestChangeDiagnosticAnalysis(featureRDD, default.featureType, intermediateListSource, fireAlertRDD, kwargs)
}

such that they could be passed here:

val summaryRDD: RDD[(FeatureId, ValidatedRow[ForestChangeDiagnosticSummary])] =
ForestChangeDiagnosticRDD(
featureRDD,
ForestChangeDiagnosticGrid.blockTileGrid,
kwargs)
and used in this join:
val featuresWithSummaries: RDD[(FEATUREID, ValidatedSummary[SUMMARY])] =
partitionedFeatureRDD.mapPartitions {
featurePartition: Iterator[
(Long, (SpatialKey, Feature[Geometry, FEATUREID]))
] =>

Describe alternatives you've considered
I've considered doing thing because geometries should be valid, but that proved to be not so.

I'm not sure how an invalid geometry is going to interact with the partitioning scheme in the ErrorSummaryRDD. It may be that they will have to be filtered out and then joined onto the results as its neither possible to place them on a map or run a polygonal summary on them. Interested to see how that turns out.

@echeipesh echeipesh added the enhancement New feature or request label Oct 12, 2021
@jpolchlo jpolchlo mentioned this issue Oct 15, 2021
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant