Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training and test datasets for togo region #120

Open
jlpearso opened this issue Jun 16, 2023 · 3 comments
Open

Training and test datasets for togo region #120

jlpearso opened this issue Jun 16, 2023 · 3 comments

Comments

@jlpearso
Copy link

jlpearso commented Jun 16, 2023

Hello! I am very new to this package and was hoping you could clarify if I am understanding things correctly.

I am trying to get a training and test dataset for togo.

Training Data for Togo:

Using the code below I was able to get training data with the correct dimensions based on the demo.ipynb. However my understanding is that the dimensions of this data are [ number of corresponding rows in labels.geojson x (12 months * 18 bands)]. So 1290 rows of togo within the geojson file and 216 values for each band/time. However when I open the geojson file and search for togo datasets the list comes up short with only 1276 entries. Can you confirm if I am misunderstanding the setup?

Screenshot 2023-06-16 at 2 23 17 PM

Additionally does the is_crop column mean that for each row the location had crops for the entire year (12 months of regressor data available)?

Test Data for Togo:
Also going off the demo notebook, to get the test data I used this code.

Screenshot 2023-06-16 at 2 42 24 PM

Can I confirm that the values are pulled from 'data/test_features/togo-eval.h5'. with the same dimensions of test location, (12 months * 18 bands)?

Screenshot 2023-06-16 at 2 57 43 PM

I really appreciate any help!

@gabrieltseng
Copy link
Collaborator

gabrieltseng commented Jun 19, 2023

Hi @jlpearso ,

  1. When the benchmark datasets are created, training data is pulled from any point within the Togo bounding box. This function is responsible for filtering out the positive and negative labels, and filters by bounding box (specifically a Togo bounding box created by this function). So the togo dataset refers to a specific dataset (all datasets are described here), but other datasets may also contain points within Togo and they will be included too. In this case, the additional points come from the GeoWiki dataset.
  2. is_crop means crop were present for a growing season captured in the year. It doesn't necessarily mean crops were being grown on that point for all 12 months.
  3. Yes, the test data has the same dimensions as the training data. You can recover the band & time dimensions by running the functions with flatten_x=False.

I hope this helps!

@jlpearso
Copy link
Author

Thank this helps a lot! Can you tell me the best way to call the function that gets the lat Lon box? Or must I look at the shape file myself? Was going to try to filer the labels this way. Thank you!

@gabrieltseng
Copy link
Collaborator

You can get the lat/lon box using the following code:

from cropharvest.countries import get_country_bbox

togo_bbox = get_country_bbox(country_name="Togo", largest_only=True)

togo_bbox is a list of bounding boxes (since some countries have more than one contiguous area). If largest_only is True, then the list has length 1 and only contains the largest bounding box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants