-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-2598 Fix gadm null filter #200
Conversation
@@ -48,9 +48,9 @@ object AFiAnalysis extends SummaryAnalysis { | |||
val summaryDF = AFiAnalysis.aggregateResults( | |||
AFiDF | |||
.getFeatureDataFrame(summaryRDD, spark) | |||
.filter(!$"gadm_id".contains("null")) | |||
.filter(!$"gadm_id".substr(1, 3).contains("null")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment above this statement to explain the 'substr(1, 3)'? It would be good to explain the general idea (that you are only looking for null in the ISO, not adm1 or adm2), and also specifically why the substr() call works.
In particular, I don't understand how a substring of length 3 is ever going to contain the string "null". Is that actually correct, and if so, maybe you can explain why in the comment. I.e. what is the format of the "gadm_id" column (something like "IDN.1.5", right?) and does "null" actually show up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually might not want to check if anything is null? "null" is valid for adm1 and adm2 depending on the country, the only thing we'd want to filter out is when iso is an empty string.
Or maybe not filter anything at all? Because we saw there are occasionally results outside of GADM boundaries somehow, do we want to discard those or keep those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jterry64 We currently use the row with location_id = -1
and gadm_id = ''
to hold summed values for the dissolved list. The filter disregards stray pixels (like over oceans) that would add another row with those specific values. Or we assign them location_id = -3
or some other value. Let me know what works
@@ -48,9 +48,9 @@ object AFiAnalysis extends SummaryAnalysis { | |||
val summaryDF = AFiAnalysis.aggregateResults( | |||
AFiDF | |||
.getFeatureDataFrame(summaryRDD, spark) | |||
.filter(!$"gadm_id".contains("null")) | |||
.filter(!$"gadm_id".substr(1, 3).contains("null")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually might not want to check if anything is null? "null" is valid for adm1 and adm2 depending on the country, the only thing we'd want to filter out is when iso is an empty string.
Or maybe not filter anything at all? Because we saw there are occasionally results outside of GADM boundaries somehow, do we want to discard those or keep those?
Codecov ReportAttention: ❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
... and 1 file with indirect coverage changes 📢 Thoughts on this report? Let us know! |
@jterry64 I moved the filer from |
It seems like it might be easier to understand and process for the front-end if you make null values be the empty string "" in the gadmId. So, then it would look like "SGP.1." But I'm sure the front-end folks could deal with either, as long as you warn them about the possibility (missing adm1 or adm2) and how it will show up. What does a missing adm1 or adm2 mean? Is it actually land in the country which somehow is not in any adm1 (or any adm2 with an adm1)? |
if (gadmAdm0 == "") { | ||
return | ||
} | ||
val gadmAdm1: Integer = raster.tile.gadmAdm1.getData(col, row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add a comment here that either gadmAdm1 and gadmAdm1 could possibly be null (which came from a 9999 pixel value in the raster)
Pull request checklist
Please check if your PR fulfills the following requirements:
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
AFi Analysis currently filters out groups in which gadm_id contains any null. This causes areas without adm2 IDs, such as Singapore, to be excluded from results
Issue Number: GTC-2598
What is the new behavior?
Does this introduce a breaking change?
Other information