diff --git a/joss-paper/paper.html b/joss-paper/paper.html index 59cdbaf..065338c 100644 --- a/joss-paper/paper.html +++ b/joss-paper/paper.html @@ -380,17 +380,17 @@

1 August 2024

Summary

-

modisfast is an R package that allows for easy and fast +

modisfast is an R package designed for easy and fast downloads of various Earth Observation (EO) data, including the Moderate Resolution Imaging Spectroradiometer (MODIS) Land products, the Visible Infrared Imaging Radiometer Suite (VIIRS) Land products, and the Global -Precipitation Measurement mission (GPM) products. Based on the -Open-source Project for a Network Data Access Protocol (OPeNDAP) -standard framework, it enables users to apply filters (spatial, -temporal, and dimensional) directly at the downloading phase, supports -parallelized downloads, and streamlines data import into R. Therefore, -modisfast offers R users a cost-effective, time-efficient, -and energy-saving approach to accessing a set of key EO datasets.

+Precipitation Measurement mission (GPM) products. It enables users to +subset the data directly during the download phase using spatial, +temporal, and dimensional filters and supports parallelized downloads. +It also streamlines the process of importing the downloaded data into R. +Overall, modisfast offers R users a cost-effective, +time-efficient, and energy-saving approach to accessing a set of key EO +datasets with R.

Statement of need

@@ -398,57 +398,55 @@

Statement of need

valuable resource for monitoring and understanding our planet, especially in the context of global change. EO data from the U.S. federal government National Aeronautics and Space Administration (NASA) -are among the richest and longest-standing in the field. Iconic and -widely used NASA EO data collections include the MODIS Land products +are among the richest and longest-standing in the field. Iconic, widely +used and free NASA EO data collections include the MODIS Land products (Justice et al. 2002), the VIIRS products - which continue the legacy of MODIS (Román et al. 2024), and the GPM mission products (Skofronick-Jackson et al. 2017). Collectively, these products have provided essential data for over 20 years, enabling -the study of Earth’s dynamics, including (but not limited to) global -land cover, vegetation health, land surface temperature, rainfall -patterns, burned areas. They support research in climate change, -disaster response, biodiversity, ecosystem monitoring, ecology, -epidemiology, and more (Shao, Taff, and Lunetta -2011).

-

Despite the increasing availability and utility of EO data, accessing -and using them presents several challenges. Researchers often encounter -issues such as multiple data sources, data complexity, large file sizes, -and the need for advanced technical skills to process the information -(Agnoli et al. 2023). These data are -typically distributed as multidimensional layers over extensive areas, -making accessibility and processing difficult, especially when large -time series are required. This problem is particularly acute in -developing regions where internet infrastructure can be limited. The -complexity of accessing EO data often leads to siloed data processing -workflows, separating data extraction from pre-processing and analysis, -thereby hindering transparent and reproducible open science practices. -While tools like Google Earth -Engine (Gorelick et al. 2017) offer -some solutions, they also have limitations, such as proprietary -software. Altogether, these barriers hinder the full potential of Earth -observation data to support global research and decision-making. Efforts -to simplify and streamline access to these data, while maintaining an -open-source and open-science framework, are essential to overcoming -these obstacles and maximizing the benefits of satellite data for -all.

+the study of Earth’s dynamics, including global land cover, vegetation +health, burned areas, land surface temperature, rainfall patterns. They +support research in climate change, disaster response, biodiversity, +ecosystem monitoring, ecology, public health, and more (Shao, Taff, and Lunetta 2011).

+

Despite the increasing availability and utility of these EO data, +accessing and using them presents several challenges. Researchers often +encounter issues such as multiple data sources, data complexity, or +large file sizes (Agnoli et al. 2023). +These data are typically distributed as multidimensional layers over +extensive areas, making accessibility and processing difficult, +especially when large time series are required. This problem is +particularly acute in developing regions where internet infrastructure, +needed to access the data, can be limited. The complexity of accessing +EO data can result in obstacles such as underutilization or the creation +of siloed data processing workflows, which separate data extraction from +pre-processing and analysis, thereby hindering transparent and +reproducible open science practices. While some powerful tools, such as +Google Earth Engine (Gorelick et al. 2017) offer some solutions to +these problems, they also have important limitations, such as +proprietary and energy-intensive software. Altogether, these barriers +hinder the full potential of Earth observation data to support global +research and decision-making. Efforts to simplify and streamline access +to these data, while maintaining an open-source and open-science +framework, are essential to overcoming these obstacles and maximizing +the benefits of satellite data for all.

Here, we introduce modisfast, an open-source R package (R Core Team 2024) designed to simplify, streamline, and accelerate the download and import of MODIS, VIIRS, and GPM time series for R users. This package expands and the existing ecosystem of R tools for accessing MODIS data, enhancing it by -introducing new features (see section (comp-other-soft). -modisfast allows users to subset these datasets using -spatial, temporal, and band/layer directly at the downloading phase, -optimizing data download and processing while promoting digital -sobriety. Additionally, downloads can be parallelized for increased -efficiency. modisfast thus facilitates access to EO data -for R users, particularly in regions with limited internet -infrastructure, and enables embedding data extraction within complex and -holistic data workflows in R - fostering transparency and -reproducibility in the context of Open Science. Importantly, the -foundational framework of modisfast (see section -(foundational-fmwrk)) guarantees the package’s long-term sustainability, -open-source nature, and cost-free availability.

+introducing new features and data sources (see section +(comp-other-soft)). modisfast allows users to subset these +datasets using spatial, temporal, and band/layer directly at the +downloading phase, optimizing data download and processing. +Additionally, downloads can be parallelized for increased efficiency. +modisfast thus facilitates access to EO data for R users, +particularly in regions with limited internet infrastructure, and +enables embedding data extraction within complex and holistic data +workflows in R - fostering transparency and reproducibility in the +context of Open Science. Importantly, the foundational framework of +modisfast (see section (foundational-fmwrk)) guarantees the +package’s long-term sustainability, open-source nature, and cost-free +availability.

Target audience

@@ -478,19 +476,20 @@

Typical workflow

R with modisfast involves the following steps :

  1. Defining the parameters of interest as natural R objects,
  2. -
  3. Login to NASA EOSDIS EarthData with the function +
  4. Login to NASA EOSDIS EarthData, with the function mf_login(),
  5. -
  6. Building the URL(s) of the dataset(s) of interest with the function +
  7. Building the URL(s) of the dataset(s) of interest, with the function mf_get_url(),
  8. -
  9. Downloading the dataset(s) with the function +
  10. Downloading the dataset(s), with the function mf_download_data() (the preferred and default output format is the widely used NetCDF format, although other formats can be defined),
  11. -
  12. Importing the dataset(s) in R as a SpatRast object of -terra library (Hijmans 2024) -with the function mf_import_data().
  13. +
  14. Importing the dataset(s) in R as a SpatRast object of +the terra library (Hijmans +2024), with the function mf_import_data().
-

This workflow is graphically summarized in figure .

+

This workflow is graphically summarized in figure along with a toy +example.

Workflow for MODIS, VIIRS or GPM data download and import with modisfast.
Workflow for MODIS, VIIRS or GPM data download @@ -518,7 +517,7 @@

Special features

  • Ability to import the downloaded data as a Virtual Raster Dataset (VRT) (useful for high-volume data, e.g. country or continental scale data at high spatial resolution) (see documentation -of function mf_import_data()).
  • +of the function mf_import_data()).
    @@ -548,26 +547,27 @@

    Foundational framework of modisfast

    that develops them. The OPeNDAP is designed to simplify access to structured and high-volume data, such as satellite products, over the Web. It is a collaborative effort involving multiple institutions and -companies, with open-source code, free software, and adherence to OGC -standards. It is widely used by NASA, which partly finances it. A key -feature of OPeNDAP is its capability to apply filters at the data -download process, ensuring that only the necessary data is retrieved. -These filters, specified within a URL, can be spatial, temporal, or -dimensional. Nevertheless, OPeNDAP URLs are not trivial to build. -modisfast constructs these URLs based on the filters that -are specified by the user through standard R objects.

    -

    This robust, sustainable, and cost-free foundational framework, both -for the data provider (NASA) and the software (R, OPeNDAP, the -tidyverse (Wickham et al. -2019) and GDAL (GDAL/OGR -contributors 2024) suite of packages), guarantees the long-term -stability and reliability of the modisfast package.

    +companies, with open-source code, free software, and adherence to the Open Geospatial Consortium (OGC) +standards. It is widely used by NASA, which partly finances it.

    +

    A key feature of OPeNDAP is its capability to apply filters at the +data download process, ensuring that only the necessary data is +retrieved. These filters, specified within a URL, can be spatial, +temporal, or dimensional. Although powerful, OPeNDAP URLs are not +trivial to build. The main objective of modisfast is to +construct these URLs based on the filters that are specified by the user +through standard R objects.

    +

    Importantly, the robust, sustainable, and cost-free foundational +framework of modisfast, both for the data provider (NASA) +and the software (R, OPeNDAP, the tidyverse and +GDAL suite of packages (Wickham et +al. 2019; GDAL/OGR contributors 2024)), guarantees the long-term +stability and reliability of the package.

    Comparison with similar packages

    -

    In addition to modisfast, there are several open-source -R packages available for accessing MODIS or VIIRS data :

    +

    Besides modisfast, there are several open-source R +packages available for accessing MODIS or VIIRS data :

    The MODIS package offers access to some MODIS data through global online data archives, but it lacks comprehensive documentation and was removed from @@ -585,9 +585,9 @@

    Comparison with similar packages

    programmatic interface to the ‘MODIS Land Products Subsets’ web service, providing access to 46 MODIS and VIIRS collections. This package, available on CRAN, extracts data at -specific points or buffer zones around coordinates, outputting in R -data.frame or .csv format, which is not a standard -geospatial format. This makes it suitable for point-based data +specific points or buffer zones around coordinates, outputting a R +data.frame or a file in .csv format, which is not a +standard geospatial format. This makes it suitable for point-based data extraction but less effective for area-based queries.

    The appeears package (Hufkens and Campitelli 2023) acts @@ -598,11 +598,11 @@

    Comparison with similar packages

    allows accessing data from various NASA federal archives, including MODIS and VIIRS, and enables users to subset geospatial datasets using spatial, temporal, and band/layer parameters. Indeed, as for -modisfast, the main sources of data are NASA OPeNDAP -servers. While similar to modisfast, appeears -offers a broader range of data sources but has a latency period (ranging -from minutes to hours) for query processing due to server-side -post-processing (mosaicking, reprojection, etc.).

    +modisfast, AppEEARS uses OPeNDAP. While similar to +modisfast, appeears offers a broader range of +data sources but has a latency period (ranging from minutes to hours) +for query processing due to server-side post-processing (mosaicking, +reprojection, etc.).

    Finally, some R packages, such as rgee (Aybar et al. 2020), rely on proprietary software or data access protocols and are not discussed here for that reason.

    @@ -696,28 +696,31 @@

    Future work

    Future development of the package may include access to additional data collections from other OPeNDAP servers, and support for a variety of data formats as they become available from data providers through -their OPeNDAP servers. Furthermore, the creation of an RShiny -application (Chang et al. 2024) on top of -the package is being considered, as a means of further simplifying data -access for users with limited coding skills.

    +their OPeNDAP servers. The addition of a meta-function that facilitates +executing the entire process at once (login, URL building, data +download, and data import into R) is also envisaged. Furthermore, the +creation of an RShiny application (Chang et al. +2024) on top of the package is being considered, as a means of +further simplifying data access for users with limited coding +skills.

    Acknowledgements

    We thank NASA and its partners for making all their Earth science -data freely available, and implementing open data access protocols such -as OPeNDAP. modisfast heavily builds on top of the OPeNDAP, -so we thank the non-profit OPeNDAP, Inc. for developing +data freely available, and financing and implementing open data access +protocols such as OPeNDAP. modisfast heavily builds on top +of the OPeNDAP, so we thank the non-profit OPeNDAP, Inc. for developing the eponym tool in an open and collaborative way.

    We also thank the contributors that have tested the package, reviewed the documentation and brought valuable feedbacks to improve the package -: Florian de Boissieu, Julien Taconet, Annelise Tran.

    +: Florian de Boissieu, Julien Taconet, Annelise Tran, Alexandre +Cebeillac.

    This work has been developed over the course of several research -projects :

    +projects conducted by the French National Research Institute for +Sustainable Development (IRD) and its partners :