Skip to content

Commit

Permalink
Add Divand batch tool (galaxyecology#118)
Browse files Browse the repository at this point in the history
* add batch tool diva

* add test files

* Set JULIA_DEPOT_PATH to get a writable Julia HOME

* fix some test settings more needed

* fix dates

* add test files

* fix test & improve tool

Not totally fixed but we're getting there

* fix test

* fix typo

* reduced test files

* Delete tools/ocean/data_from_Eutrophication_Med_profiles_2022_unrestricted.nc

* Delete tools/ocean/DIVAnd_netcdf_output.netcdf

* reduced test files in right folder

* update test

* update test avoid big file

* Delete tools/ocean/test-data/gebco_30sec_8.nc

* Update tools/ocean/divandfull.xml

Co-authored-by: Björn Grüning <[email protected]>

* Update tools/ocean/divandfull.xml

Co-authored-by: Björn Grüning <[email protected]>

* Delete tools/ocean/test-data/DIVAnd_netcdf_output.netcdf

* cleaning up julia script

---------

Co-authored-by: Björn Grüning <[email protected]>
Co-authored-by: Bjoern Gruening <[email protected]>
  • Loading branch information
3 people authored Aug 1, 2024
1 parent 3b5d66e commit e395cfe
Show file tree
Hide file tree
Showing 3 changed files with 351 additions and 0 deletions.
211 changes: 211 additions & 0 deletions tools/ocean/divandfull.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
#Julia script

###############################
## DIVAndrun analsysis ##
###############################
import Pkg;
using Pkg
Pkg.status()

### Import packages
using DIVAnd
using Dates
using Printf
# Getting the arguments from the command line
args = ARGS

# Import data
if length(args) < 4
error("This tool needs at least 4 arguments")
else
netcdf_data = args[1]
longmin = parse(Float64, args[2])
longmax = parse(Float64, args[3])
latmin = parse(Float64, args[4])
latmax = parse(Float64, args[5])
startdate = args[6] # yyyy,mm,dd
enddate = args[7]
varname = args[8]
selmin = parse(Float64, args[9])
selmax = parse(Float64, args[10])
bathname = args[11]
end

## This script will create a climatology:
# 1. ODV data reading.
# 2. Extraction of bathymetry and creation of mask
# 3. Data download from other sources and duplicate removal.
# 4. Quality control.
# 5. Parameter optimisation.
# 6. Spatio-temporal interpolation with DIVAnd.


### Configuration
# Define the horizontal, vertical (depth levels) and temporal resolutions.
# Select the variable of interest

dx, dy = 0.125, 0.125
lonr = longmin:dx:longmax
latr = latmin:dy:latmax

# Convert string in date
startdate = Date(startdate, "yyyy-mm-dd")

# extract year month day
startyear = year(startdate)
startmonth = month(startdate)
startday = day(startdate)

# Convert string in date
enddate = Date(enddate, "yyyy-mm-dd")

# extract year month day
endyear = year(enddate)
endmonth = month(enddate)
endday = day(enddate)

timerange = [Date(startyear, startmonth, startday),Date(endyear, endmonth, endday)];

depthr = [0.,5., 10., 15., 20., 25., 30., 40., 50., 66,
75, 85, 100, 112, 125, 135, 150, 175, 200, 225, 250,
275, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750,
800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250,
1300, 1350, 1400, 1450, 1500, 1600, 1750, 1850, 2000];
depthr = [0.,10.,20.];

varname = varname
yearlist = [1900:2023];
monthlist = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]];

# We create here the variable TS (for "tDataset(netcdf_data,"r")ime selector"), which allows us to work with the observations corresponding to each period of interest.

TS = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist);
@show TS;

figdir = "outputs/"
if ~(isdir(figdir))
mkdir(figdir)
else
@info("Figure directory already exists")
end
### 1. Read your ODV file
# Adapt the datadir and datafile values.
# The example is based on a sub-setting of the Mediterranean Sea aggregated dataset.
# The dataset has been extracted around the Adriatic Sea and exported to a netCDF using Ocean Data
datadir = "../data"

datafile = netcdf_data

# Then you can read the full file:
@time obsval,obslon,obslat,obsdepth,obstime,obsid = NCODV.load(Float64, datafile,
"Water body $(varname)");

# Check the extremal values of the observations
checkobs((obslon,obslat,obsdepth,obstime),obsval,obsid)

### 2. Extract the bathymetry

# It is used to delimit the domain where the interpolation is performed.
## 2.1 Choice of bathymetry

# Modify bathname according to the resolution required.

@time bx,by,b = load_bath(bathname,true,lonr,latr);

## 2.2 Create mask
# False for sea
# True for land

mask = falses(size(b,1),size(b,2),length(depthr))
for k = 1:length(depthr)
for j = 1:size(b,2)
for i = 1:size(b,1)
mask[i,j,k] = b[i,j] >= depthr[k]
end
end
end
@show size(mask)

### 3. Quality control
# We check the salinity value.
# Adapt the criteria to your region and variable.

sel = (obsval .<= selmax) .& (obsval .>= selmin);

obsval = obsval[sel]
obslon = obslon[sel]
obslat = obslat[sel]
obsdepth = obsdepth[sel]
obstime = obstime[sel]
obsid = obsid[sel];

### 4. Analysis parameters
# Correlation lengths and noise-to-signal ratio

# We will use the function diva3D for the calculations.
# With this function, the correlation length has to be defined in meters, not in degrees.

sz = (length(lonr),length(latr),length(depthr));
lenx = fill(100_000.,sz) # 100 km
leny = fill(100_000.,sz) # 100 km
lenz = fill(25.,sz); # 25 m
len = (lenx, leny, lenz);
epsilon2 = 0.1;

### Output file name
outputdir = "outputs_netcdf/"
if !isdir(outputdir)
mkpath(outputdir)
end
filename = joinpath(outputdir, "Water_body_$(replace(varname," "=>"_")).nc")

### 7. Analysis
# Remove the result file before running the analysis, otherwise you'll get the message
if isfile(filename)
rm(filename) # delete the previous analysis
@info "Removing file $filename"
end

## 7.1 Plotting function
# Define a plotting function that will be applied for each time index and depth level.
# All the figures will be saved in a selected directory.

function plotres(timeindex,sel,fit,erri)
tmp = copy(fit)
nx,ny,nz = size(tmp)
for i in 1:nz
figure("Additional-Data")
ax = subplot(1,1,1)
ax.tick_params("both",labelsize=6)
ylim(39.0, 46.0);
xlim(11.5, 20.0);
title("Depth: (timeindex)", fontsize=6)
pcolor(lonr.-dx/2.,latr.-dy/2, permutedims(tmp[:,:,i], [2,1]);
vmin = 33, vmax = 40)
colorbar(extend="both", orientation="vertical", shrink=0.8).ax.tick_params(labelsize=8)

contourf(bx,by,permutedims(b,[2,1]), levels = [-1e5,0],colors = [[.5,.5,.5]])
aspectratio = 1/cos(mean(latr) * pi/180)
gca().set_aspect(aspectratio)

figname = varname * @sprintf("_%02d",i) * @sprintf("_%03d.png",timeindex)
plt.savefig(joinpath(figdir, figname), dpi=600, bbox_inches="tight");
plt.close_figs()
end
end

## 7.2 Create the gridded fields using diva3d
# Here only the noise-to-signal ratio is estimated.
# Set fitcorrlen to true to also optimise the correlation length.
@time dbinfo = DIVAnd.diva3d((lonr,latr,depthr,TS),
(obslon,obslat,obsdepth,obstime), obsval,
len, epsilon2,
filename,varname,
bathname=bathname,
fitcorrlen = false,
niter_e = 2,
surfextend = true
);

# Save the observation metadata in the NetCDF file.
DIVAnd.saveobs(filename,(obslon,obslat,obsdepth,obstime),obsid);
140 changes: 140 additions & 0 deletions tools/ocean/divandfull.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
<tool id="divand_full_analysis" name="DIVAnd" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.01" license="MIT">
<description>Data-Interpolating Variational Analysis in n dimensions</description>
<macros>
<token name="@TOOL_VERSION@">0.1.0</token>
<token name="@VERSION_SUFFIX@">0</token>
</macros>
<requirements>
<requirement type="package" version="1.8.5">julia</requirement>
<requirement type="package" version="2.7.9">julia-divand</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
## The HOME .julia folder is not writable inside the Docker container, so we need to set one that is writable.
export JULIA_DEPOT_PATH="\$PWD:\$JULIA_DEPOT_PATH" &&
julia
'$__tool_directory__/divandfull.jl'
'$input_netcdf_identifier'
'$longmin'
'$longmax'
'$latmin'
'$latmax'
'$startdate'
'$enddate'
'$varname'
'$selmin'
'$selmax'
'$bathname'
]]></command>
<inputs>
<param name="input_netcdf_identifier" type="data" format="netcdf" label="Input your netcdf data"/>
<param name="bathname" type="data" format="netcdf" label="Input your bathymetry netcdf file" help="for more info see below."/>
<param name="longmin" type="float" min="-180" max="180" value="0" label="Longitude minimal"/>
<param name="longmax" type="float" min="-180" max="180" value="0" label="Longitude maximal"/>
<param name="latmin" type="float" min="-180" max="180" value="0" label="Latitude minimal"/>
<param name="latmax" type="float" min="-180" max="180" value="0" label="Latitude maximal"/>
<param name="startdate" type="text" value="yyyy-mm-dd" label="Input the starting date">
<sanitizer invalid_char="">
<valid initial="string.digits">
<add value="-"/>
</valid>
</sanitizer>
</param>
<param name="enddate" type="text" value="yyyy-mm-dd" label="Input the ending date">
<sanitizer invalid_char="">
<valid initial="string.digits">
<add value="-"/>
</valid>
</sanitizer>
</param>
<param name="varname" type="text" value="variable" label="Write the name of the variable of the analysis" help="Example: phosphate">
<sanitizer invalid_char="">
<valid initial="string.letters">
<add value="_"/>
</valid>
</sanitizer>
<validator type="regex">[0-9a-zA-Z_]+</validator>
</param>
<param name="selmin" type="integer" min="0" max="100" optional="true" value="0" label="Minimum of the salinity"/>
<param name="selmax" type="integer" min="0" max="100" optional="true" value="0" label="Maximum of the salinity"/>
</inputs>
<outputs>
<data name="output_netcdf" label="DIVAnd netcdf output" from_work_dir="outputs_netcdf/*.nc" format="netcdf"/>
</outputs>
<tests>
<test expect_num_outputs="1">
<param name="input_netcdf_identifier" value="data_from_Eutrophication_Med_profiles_2022_unrestricted.nc"/>
<param name="bathname" location="https://dox.ulg.ac.be/index.php/s/U0pqyXhcQrXjEUX/download"/>
<param name="longmin" value="19.0"/>
<param name="longmax" value="30.0"/>
<param name="latmin" value="32.0"/>
<param name="latmax" value="38.0"/>
<param name="varname" value="phosphate"/>
<param name="startdate" value="1950-01-01"/>
<param name="enddate" value="2017-12-31"/>
<param name="selmin" value="0"/>
<param name="selmax" value="100"/>
<output name="output_netcdf">
<assert_contents>
<has_size value="68291" delta="0"/>
</assert_contents>
</output>
</test>
</tests>
<help><![CDATA[
.. class:: infomark
**What it does**
This tool takes a observation netcdf file and create climatology
**Input**
- An ocean observation netcdf file
- A bathymetry netcdf file, you can download it like this: download("https://dox.ulg.ac.be/index.php/s/U0pqyXhcQrXjEUX/download", "gebco_30sec_8.nc")
- Some complementary information for the tool to better understand your data and create your climatology on the right area: latitudes, longitudes, dates, and salinity.$
**Output**
One netcdf file containing the climatology created by DIVAnd.
**A bit of context**
DIVAnd (Data-Interpolating Variational Analysis in n dimensions) performs an n-dimensional variational analysis/gridding of
arbitrarily located observations. Observations will be interpolated/analyzed on a curvilinear grid in 1, 2, 3 or more dimensions.
In this sense it is a generalization of the original two-dimensional DIVA version (still available `here <https://github.com/gher-uliege/DIVA>`_ but
not further developed anymore).
The method bears some similarities and equivalences with Optimal Interpolation or Krigging in that it allows to create a smooth
and continous field from a collection of observations, observations which can be affected by errors. The analysis method is however
different in practise, allowing to take into account topological features, physical constraints etc in a natural way.
The method was initially developped with ocean data in mind, but it can be applied to any field where localized observations have
to be used to produce gridded fields which are "smooth".
DIVAndrun is the core analysis function in n dimensions. It does not know anything about the physical parameters or units you work with.
Coordinates can also be very general. The only constraint is that the metrics (pm,pn,po,...) when multiplied by the corresponding length
scales len lead to non-dimensional parameters. Furthermore the coordinates of the output grid (xi,yi,zi,...) need to have the same units
as the observation coordinates (x,y,z,...).
DIVAndfun is a version with a minimal set of parameters (the coordinates and values of observations, i.e. (x,f), the remaining parameters
being optional) and provides an interpolation function rather than an already gridded field.
diva3D is a higher-level function specifically designed for climatological analysis of data on Earth, using longitude/latitude/depth/time
coordinates and correlations length in meters. It makes the necessary preparation of metrics, parameter optimizations etc you normally would
program yourself before calling the analysis function DIVAndrun.
DIVAnd_heatmap can be used for additive data and produces Kernel Density Estimations.
DIVAndgo is only needed for very large problems when a call to DIVAndrun leads to memory or CPU time problems. This function tries to decide
which solver (direct or iterative) to use and how to make an automatic domain decomposition. Not all options from DIVAndrun are available.
If you want to try out multivariate approaches, you can look at DIVAnd_multivarEOF and DIVAnd_multivarJAC
If you want more informations about the functions and parameters see also the `documentations here <https://gher-uliege.github.io/DIVAnd.jl/latest/index.html>`_.
]]></help>
<citations>
<citation type="doi">doi:10.5194/gmd-7-225-2014</citation>
</citations>
</tool>
Binary file not shown.

0 comments on commit e395cfe

Please sign in to comment.