Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nnjai support #129

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

yuvraajnarula
Copy link

Pull Request for issue #128

@jacobbieker

Description

This code provides a data pipeline for handling and processing satellite data using the NNJAI library. It focuses on creating a PyTorch Dataset class (AMSUDataset) for efficiently loading and accessing Advanced Microwave Sounding Unit (AMSU) data. Below is a detailed breakdown of its functionality:

  1. Authentication Check: Verifies access to NNJAI's DataCatalog using _check_authentication().
  2. Dynamic Dataset Initialization: Loads and filters AMSU data based on specified primary descriptors and additional variables.
  3. PyTorch Integration: Implements Dataset and DataLoader for seamless compatibility with PyTorch-based pipelines.
  4. Flexible Configuration: Users can customize dataset parameters, timestamp filters, and metadata selection.
  5. Fallback for Missing Dependencies: Prompts users to install the NNJAI library if not already available.

This implementation enhances satellite data processing workflows by providing an efficient and modular solution.

Fixes #128

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration.

  1. Unit Tests:

    • Tested dataset loading with valid and invalid dataset names.
    • Verified filtering based on time, primary_descriptors, and additional_variables.
    • Checked the output format of __getitem__ to ensure data integrity.
  2. Integration Tests:

    • Used DataLoader to iterate over the dataset and verified the randomness of shuffled outputs.
  • Yes

If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! This is a good start, I think there needs to be a few different changes before this can be merged, but a good start!

"""

import numpy as np
from nnja.io import _check_authentication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest only checking for the library when importing the package. Then the AMSUDataset doesn't have to be in a giant if/else.

Suggested change
from nnja.io import _check_authentication
try:
from nnja.io import _check_authentication
from nnja import DataCatalog
Except ModuleImportError:
print("NNJA-AI library not installed. Please install with `pip install git+https://github.com/brightbandtech/nnja-ai.git`")

graph_weather/data/nnjai_wrapp.py Show resolved Hide resolved
latitude = row["LAT"]
longitude = row["LON"]
metadata = np.array([row[col] for col in self.metadata_columns], dtype=np.float32)
return time, latitude, longitude, metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest returning this as a dictionary, so its easier to track.

Suggested change
return time, latitude, longitude, metadata
return {"timestamp": time, "latitude": latitude, "longitude": longitude, "metadata": metadata}

graph_weather/data/nnjai_wrapp.py Outdated Show resolved Hide resolved
@yuvraajnarula
Copy link
Author

Here is the output for the suggestions you requested.

(py310env) PS D:\code\os_contri\graph_weather> C:\Users\DELL\miniconda3\envs\py310env\python.exe graph_weather/data/nnjai_wrapp.py
Batch size: 4
Timestamps shape: torch.Size([4])
Latitudes shape: torch.Size([4])
Longitudes shape: torch.Size([4])
Metadata shape: torch.Size([4, 1])

All items in batch:

Item 0:
Time: 1609524352.0
Latitude: -31.94420051574707
Longitude: 60.85639953613281
Metadata: tensor([180.2300])

Item 1:
Time: 1609531264.0
Item 1:
Time: 1609531264.0
Latitude: -68.8927993774414
Item 1:
Time: 1609531264.0
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])
Item 1:
Time: 1609531264.0
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 1:
Time: 1609531264.0
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 2:
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 2:
Time: 1609520640.0
Latitude: -79.92410278320312
Longitude: 97.82360076904297
Metadata: tensor([164.9300])
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 2:
Time: 1609520640.0
Latitude: -79.92410278320312
Longitude: 97.82360076904297
Metadata: tensor([164.9300])

Item 3:
Time: 1609537536.0
Latitude: -14.215499877929688
Longitude: 179.46139526367188
Metadata: tensor([223.6900])
(py310env) PS D:\code\os_contri\graph_weather>
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 2:
Time: 1609520640.0
Latitude: -79.92410278320312
Longitude: 97.82360076904297
Metadata: tensor([164.9300])


Item 2:
Time: 1609520640.0
Latitude: -79.92410278320312
Longitude: 97.82360076904297
Metadata: tensor([164.9300])

Item 3:
Time: 1609537536.0
Latitude: -14.215499877929688
Longitude: 179.46139526367188
Metadata: tensor([223.6900])

@jacobbieker
Copy link
Member

Here is the output for the suggestions you requested.

(py310env) PS D:\code\os_contri\graph_weather> C:\Users\DELL\miniconda3\envs\py310env\python.exe graph_weather/data/nnjai_wrapp.py
Batch size: 4
Timestamps shape: torch.Size([4])
Latitudes shape: torch.Size([4])
Longitudes shape: torch.Size([4])
Metadata shape: torch.Size([4, 1])

All items in batch:

Item 0:
Time: 1609524352.0
Latitude: -31.94420051574707
Longitude: 60.85639953613281
Metadata: tensor([180.2300])

Item 1:
Time: 1609531264.0
Item 1:
Time: 1609531264.0
Latitude: -68.8927993774414
Item 1:
Time: 1609531264.0
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])
Item 1:
Time: 1609531264.0
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 1:
Time: 1609531264.0
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 2:
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 2:
Time: 1609520640.0
Latitude: -79.92410278320312
Longitude: 97.82360076904297
Metadata: tensor([164.9300])
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 2:
Time: 1609520640.0
Latitude: -79.92410278320312
Longitude: 97.82360076904297
Metadata: tensor([164.9300])

Item 3:
Time: 1609537536.0
Latitude: -14.215499877929688
Longitude: 179.46139526367188
Metadata: tensor([223.6900])
(py310env) PS D:\code\os_contri\graph_weather>
Latitude: -68.8927993774414
Longitude: 57.41469955444336
Metadata: tensor([209.7500])

Item 2:
Time: 1609520640.0
Latitude: -79.92410278320312
Longitude: 97.82360076904297
Metadata: tensor([164.9300])


Item 2:
Time: 1609520640.0
Latitude: -79.92410278320312
Longitude: 97.82360076904297
Metadata: tensor([164.9300])

Item 3:
Time: 1609537536.0
Latitude: -14.215499877929688
Longitude: 179.46139526367188
Metadata: tensor([223.6900])

Hi, could you push the changes to this branch? I don't see any differences in the code?

@yuvraajnarula
Copy link
Author

@jacobbieker any updates?

Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its getting there! The only thing I would go with changing is to add a unit test in the tests/ directory that checks the output shapes from the dataset to ensure it is what it expects. You basically have the test here, under the main bit, just change it to check for the output shapes and put that part in a unit test, then it can be merged!

return {key: torch.stack([item[key] for item in batch]) for key in batch[0].keys()}


if __name__ == "__main__":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a unit test for this dataset under the tests/? Basically, have it do this, but instead of printing it out, check for the expected shapes and such.

Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thanks for making the changes.

@yuvraajnarula
Copy link
Author

Thank you for guiding me. If there's anything more I could do for this issue then do let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for NNJA-AI Data
2 participants