NREL · bnb32 · Jan 4, 2025 · Dec 20, 2024 · Dec 20, 2024 · Dec 21, 2024
diff --git a/README.rst b/README.rst
@@ -78,4 +78,4 @@ Brandon Benton, Grant Buster, Guilherme Pimenta Castelao, Malik Hassanaly, Pavlo
 Acknowledgments
 ===============
 
-This work was authored by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. This research was supported by the Grid Modernization Initiative of the U.S. Department of Energy (DOE) as part of its Grid Modernization Laboratory Consortium, a strategic partnership between DOE and the national laboratories to bring together leading experts, technologies, and resources to collaborate on the goal of modernizing the nation’s grid. Funding provided by the the DOE Office of Energy Efficiency and Renewable Energy (EERE), the DOE Office of Electricity (OE), DOE Grid Deployment Office (GDO), the DOE Office of Fossil Energy and Carbon Management (FECM), and the DOE Office of Cybersecurity, Energy Security, and Emergency Response (CESER), the DOE Advanced Scientific Computing Research (ASCR) program, the DOE Solar Energy Technologies Office (SETO), the DOE Wind Energy Technologies Office (WETO), the United States Agency for International Development (USAID), and the Laboratory Directed Research and Development (LDRD) program at the National Renewable Energy Laboratory. The research was performed using computational resources sponsored by the Department of Energy's Office of Energy Efficiency and Renewable Energy and located at the National Renewable Energy Laboratory. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.
+This work was authored by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. This research was supported by the Grid Modernization Initiative of the U.S. Department of Energy (DOE) as part of its Grid Modernization Laboratory Consortium, a strategic partnership between DOE and the national laboratories to bring together leading experts, technologies, and resources to collaborate on the goal of modernizing the nation’s grid. Funding provided by the the DOE Office of Energy Efficiency and Renewable Energy (EERE), the DOE Office of Electricity (OE), DOE Grid Deployment Office (GDO), the DOE Office of Fossil Energy and Carbon Management (FECM), and the DOE Office of Cybersecurity, Energy Security, and Emergency Response (CESER), the DOE Advanced Scientific Computing Research (ASCR) program, the DOE Solar Energy Technologies Office (SETO), the DOE Wind Energy Technologies Office (WETO), the United States Agency for International Development (USAID), and the Laboratory Directed Research and Development (LDRD) program at the National Renewable Energy Laboratory. The research was performed using computational resources sponsored by the Department of Energy's Office of Energy Efficiency and Renewable Energy and located at the National Renewable Energy Laboratory. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.
diff --git a/examples/sup3rwind/README.rst b/examples/sup3rwind/README.rst
@@ -2,7 +2,7 @@
 Sup3rWind Examples
 ###################
 
-Super-Resolution for Renewable Energy Resource Data with Wind from Reanalysis Data (Sup3rWind) is one application of the sup3r software. In this work, we train generative models to create high-resolution (2km 5-minute) wind data based on coarse (30km hourly) ERA5 data. The generative models and high-resolution output data is publicly available via the `Open Energy Data Initiative (OEDI) <https://data.openei.org/s3_viewer?bucket=nrel-pds-wtk&prefix=sup3rwind%2F>`__ and via HSDS at the bucket ``nrel-pds-hsds`` and path ``/nrel/wtk/sup3rwind``. This data covers recent historical time periods for an expanding selection of countries.
+Super-Resolution for Renewable Energy Resource Data with Wind from Reanalysis Data (Sup3rWind) is one application of the sup3r software. In this work, we train generative models to create high-resolution (2km 5-minute) wind data based on coarse (30km hourly) ERA5 data. The generative models, high-resolution output data, and training data is publicly available via the `Open Energy Data Initiative (OEDI) <https://data.openei.org/s3_viewer?bucket=nrel-pds-wtk&prefix=sup3rwind%2F>`__ and via HSDS at the bucket ``nrel-pds-hsds`` and path ``/nrel/wtk/sup3rwind``. This data covers recent historical time periods for an expanding selection of countries.
 
 Sup3rWind Data Access
 ----------------------
@@ -11,8 +11,8 @@ The Sup3rWind data and models are publicly available in a public AWS S3 bucket.
 
 The Sup3rWind data is also loaded into `HSDS <https://www.hdfgroup.org/solutions/highly-scalable-data-service-hsds/>`__ so that you may stream the data via the `NREL developer API <https://developer.nrel.gov/signup/>`__ or your own HSDS server. This is the best option if you're not going to want a full annual dataset. See these `rex instructions <https://nrel.github.io/rex/misc/examples.hsds.html>`__ for more details on how to access this data with HSDS and rex.
 
-Example Sup3rWind Data Usage
------------------------------
+Sup3rWind Data Usage
+---------------------
 
 Sup3rWind data can be used in generally the same way as `Sup3rCC <https://nrel.github.io/sup3r/examples/sup3rcc.html>`__ data, with the condition that Sup3rWind includes only wind data and ancillary variables for modeling wind energy generation. Refer to the Sup3rCC `example notebook <https://github.com/NREL/sup3r/tree/main/examples/sup3rcc/using_the_data.ipynb>`__ for usage patterns.
 
@@ -32,6 +32,39 @@ The process for running the Sup3rWind models is much the same as for `Sup3rCC <h
 #. If you're running on a slurm cluster, this will kick off a number of jobs that you can see with the ``squeue`` command. If you're running locally, your terminal should now be running the Sup3rWind models. The software will create a ``./logs/`` directory in which you can monitor the progress of your jobs.
 #. The ``sup3r-pipeline`` is designed to run several modules in serial, with each module running multiple chunks in parallel. Once the first module (forward-pass) finishes, you'll want to run ``python -m sup3r.cli -c config_pipeline.json pipeline`` again. This will clean up status files and kick off the next step in the pipeline (if the current step was successful).
 
+Training from scratch
+---------------------
+
+To train Sup3rWind models from scratch use the public training `data <https://data.openei.org/s3_viewer?bucket=nrel-pds-wtk&prefix=sup3rwind%2Ftraining_data%2F>`__. This data is for training the spatial enhancement models only. The 2024-01 `models <https://data.openei.org/s3_viewer?bucket=nrel-pds-wtk&prefix=sup3rwind%2Fmodels%2Fsup3rwind_models_202401%2F>`__ perform spatial enhancement in two steps, 3x from ERA5 to coarsened WTK and 5x from coarsened WTK to uncoarsened WTK. The currently used approach performs spatial enhancement in a single 15x step.
+
+For a given year and training domain, initialize low-resolution and high-resolution data handlers and wrap these in a dual rasterizer object. Do this for as many years and training regions as desired, and use these containers to initialize a batch handler. To train models for 3x spatial enhancement use ``hr_spatial_coarsen=5`` in the ``hr_dh``. To train models for 15x (the currently used approach) ``hr_spatial_coarsen=1``. (Refer to tests and docs for information on additional arguments, denoted by the ellipses)::
+
+  from sup3r.preprocessing import DataHandler, DualBatchHandler, DualRasterizer
+  containers = []
+  for tdir in training_dirs:
+    lr_dh = DataHandler(f"{tdir}/lr_*.h5", ...)
+    hr_dh = DataHandler(f"{tdir}/hr_*.h5", hr_spatial_coarsen=...)
+    container = DualRasterizer({'low_res': lr_dh, 'high_res': hr_dh}, ...)
+    containers.append(container)
+  bh = DualBatchHandler(train_containers=containers, ...)
+
+To train a 5x model use the ``hr_*.h5`` files for both the ``lr_dh`` and the ``hr_dh``. Use ``hr_spatial_coarsen=3`` in the ``lr_dh`` and ``hr_spatial_coarsen=1`` in the ``hr_dh``::
+
+  for tdir in training_dirs:
+    lr_dh = DataHandler(f"{tdir}/hr_*.h5", hr_spatial_coarsen=3, ...)
+    hr_dh = DataHandler(f"{tdir}/hr_*.h5", hr_spatial_coarsen=1, ...)
+    container = DualRasterizer({'low_res': lr_dh, 'high_res': hr_dh}, ...)
+    containers.append(container)
+  bh = DualBatchHandler(train_containers=containers, ...)
+
+
+Initialize a 3x, 5x, or 15x spatial enhancement model, with 14 output channels, and train for the desired number of epochs. (The 3x and 5x generator configs can be copied from the ``model_params.json`` files in each OEDI model `directory <https://data.openei.org/s3_viewer?bucket=nrel-pds-wtk&prefix=sup3rwind%2Fmodels%2Fsup3rwind_models_202401%2F>`__. The 15x generator config can be created from the OEDI model configs by changing the spatial enhancement factor or from the configs in the repo by changing the enhancement factor and the number of output channels)::
+
+  from sup3r.models import Sup3rGan
+  model = Sup3rGan(gen_layers="./gen_config.json", disc_layers="./disc_config.json", ...)
+  model.train(batch_handler, ...)
+
+
 Sup3rWind Versions
 -------------------
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -42,10 +42,21 @@ dependencies = [
   "pytest>=5.2",
   "scipy>=1.0.0",
   "sphinx>=7.0",
-  "tensorflow>2.4,<2.16",
   "xarray>=2023.0"
 ]
 
+# If used, cause glibc conflict
+# [tool.pixi.target.linux-64.dependencies]
+# cuda = ">=11.8"
+# cudnn = {version = ">=8.6.0", channel = "conda-forge"}
+# # 8.9.7
+
+[tool.pixi.target.linux-64.pypi-dependencies]
+tensorflow = {version = "~=2.15.1", extras = ["and-cuda"] }
+
+[tool.pixi.target.osx-arm64.dependencies]
+tensorflow = {version = "~=2.15.0", channel = "conda-forge"}
+
 [project.optional-dependencies]
 dev = [
   "build>=0.5",
@@ -272,7 +283,6 @@ matplotlib = ">=3.1"
 numpy = "~=1.7"
 pandas = ">=2.0"
 scipy = ">=1.0.0"
-tensorflow = ">2.4,<2.16"
 xarray = ">=2023.0"
 
 [tool.pixi.pypi-dependencies]
@@ -284,6 +294,7 @@ NREL-farms = { version = ">=1.0.4" }
 
 [tool.pixi.environments]
 default = { solve-group = "default" }
+kestrel = { features = ["kestrel"], solve-group = "default" }
 dev = { features = ["dev", "doc", "test"], solve-group = "default" }
 doc = { features = ["doc"], solve-group = "default" }
 test = { features = ["test"], solve-group = "default" }

diff --git a/sup3r/bias/abstract.py b/sup3r/bias/abstract.py
@@ -0,0 +1,167 @@
+"""Bias correction class interface."""
+
+import logging
+from abc import ABC, abstractmethod
+from concurrent.futures import ProcessPoolExecutor, as_completed
+
+import numpy as np
+
+from sup3r.preprocessing import DataHandler
+
+logger = logging.getLogger(__name__)
+
+
+class AbstractBiasCorrection(ABC):
+    """Minimal interface for bias correction classes"""
+
+    @abstractmethod
+    def _get_run_kwargs(self, **kwargs_extras):
+        """Get dictionary of kwarg dictionaries to use for calls to
+        ``_run_single``. Each key-value pair is a bias_gid with the associated
+        ``_run_single`` arguments for that gid"""
+
+    def _run_in_parallel(self, task_kwargs, max_workers=None):
+        """
+        Execute a list of tasks in parallel using ``ProcessPoolExecutor``.
+
+        Parameters
+        ----------
+        task_kwargs : dictionary
+            A dictionary of keyword argument dictionaries for a single call to
+            ``task_function``.
+        max_workers : int, optional
+            The maximum number of workers to use. If None, it uses all
+            available.
+
+        Returns
+        -------
+        results : dictionary
+            A dictionary of results from the executed tasks with the same keys
+            as ``task_kwargs``.
+        """
+
+        results = {}
+        with ProcessPoolExecutor(max_workers=max_workers) as exe:
+            futures = {
+                exe.submit(self._run_single, **kwargs): bias_gid
+                for bias_gid, kwargs in task_kwargs.items()
+            }
+            for future in as_completed(futures):
+                bias_gid = futures[future]
+                results[bias_gid] = future.result()
+        return results
+
+    def _run(
+        self,
+        out,
+        max_workers=None,
+        fill_extend=True,
+        smooth_extend=0,
+        smooth_interior=0,
+        **kwargs_extras,
+    ):
+        """Run correction factor calculations for every site in the bias
+        dataset
+
+        Parameters
+        ----------
+        out : dict
+            Dictionary of arrays to fill with bias correction factors.
+        max_workers : int
+            Number of workers to run in parallel. 1 is serial and None is all
+            available.
+        daily_reduction : None | str
+            Option to do a reduction of the hourly+ source base data to daily
+            data. Can be None (no reduction, keep source time frequency), "avg"
+            (daily average), "max" (daily max), "min" (daily min),
+            "sum" (daily sum/total)
+        fill_extend : bool
+            Flag to fill data past distance_upper_bound using spatial nearest
+            neighbor. If False, the extended domain will be left as NaN.
+        smooth_extend : float
+            Option to smooth the scalar/adder data outside of the spatial
+            domain set by the distance_upper_bound input. This alleviates the
+            weird seams far from the domain of interest. This value is the
+            standard deviation for the gaussian_filter kernel
+        smooth_interior : float
+            Option to smooth the scalar/adder data within the valid spatial
+            domain.  This can reduce the affect of extreme values within
+            aggregations over large number of pixels.
+        kwargs_extras: dict
+            Additional kwargs that get sent to ``_run_single`` e.g.
+            daily_reduction='avg', zero_rate_threshold=1.157e-7
+
+        Returns
+        -------
+        out : dict
+            Dictionary of values defining the mean/std of the bias + base data
+            and correction factors to correct the biased data like: bias_data *
+            scalar + adder. Each value is of shape (lat, lon, time).
+        """
+        self.bad_bias_gids = []
+
+        task_kwargs = self._get_run_kwargs(**kwargs_extras)
+        # sup3r DataHandler opening base files will load all data in parallel
+        # during the init and should not be passed in parallel to workers
+        if isinstance(self.base_dh, DataHandler):
+            max_workers = 1
+
+        if max_workers == 1:
+            logger.debug('Running serial calculation.')
+            results = {
+                bias_gid: self._run_single(**kwargs, base_dh_inst=self.base_dh)
+                for bias_gid, kwargs in task_kwargs.items()
+            }
+        else:
+            logger.info(
+                'Running parallel calculation with %s workers.', max_workers
+            )
+            results = self._run_in_parallel(
+                task_kwargs, max_workers=max_workers
+            )
+        for i, (bias_gid, single_out) in enumerate(results.items()):
+            raster_loc = np.where(self.bias_gid_raster == bias_gid)
+            for key, arr in single_out.items():
+                out[key][raster_loc] = arr
+            logger.info(
+                'Completed bias calculations for %s out of %s sites',
+                i + 1,
+                len(results),
+            )
+
+        logger.info('Finished calculating bias correction factors.')
+
+        return self.fill_and_smooth(
+            out, fill_extend, smooth_extend, smooth_interior
+        )
+
+    @abstractmethod
+    def run(
+        self,
+        fp_out=None,
+        max_workers=None,
+        daily_reduction='avg',
+        fill_extend=True,
+        smooth_extend=0,
+        smooth_interior=0,
+    ):
+        """Run correction factor calculations for every site in the bias
+        dataset"""
+
+    @classmethod
+    @abstractmethod
+    def _run_single(
+        cls,
+        bias_data,
+        base_fps,
+        bias_feature,
+        base_dset,
+        base_gid,
+        base_handler,
+        daily_reduction,
+        bias_ti,
+        decimals,
+        base_dh_inst=None,
+        match_zero_rate=False,
+    ):
+        """Find the bias correction factors at a single site"""
diff --git a/sup3r/bias/base.py b/sup3r/bias/base.py
@@ -43,7 +43,7 @@ def __init__(
         bias_handler_kwargs=None,
         decimals=None,
         match_zero_rate=False,
-        pre_load=True
+        pre_load=True,
     ):
         """
         Parameters
@@ -178,7 +178,7 @@ class is used, all data will be loaded in this class'
 
         self.nn_dist, self.nn_ind = self.bias_tree.query(
             self.base_meta[['latitude', 'longitude']],
-            distance_upper_bound=self.distance_upper_bound
+            distance_upper_bound=self.distance_upper_bound,
         )
 
         if pre_load: