forked from galaxyecology/tools-ecology
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Divand batch tool (galaxyecology#118)
* add batch tool diva * add test files * Set JULIA_DEPOT_PATH to get a writable Julia HOME * fix some test settings more needed * fix dates * add test files * fix test & improve tool Not totally fixed but we're getting there * fix test * fix typo * reduced test files * Delete tools/ocean/data_from_Eutrophication_Med_profiles_2022_unrestricted.nc * Delete tools/ocean/DIVAnd_netcdf_output.netcdf * reduced test files in right folder * update test * update test avoid big file * Delete tools/ocean/test-data/gebco_30sec_8.nc * Update tools/ocean/divandfull.xml Co-authored-by: Björn Grüning <[email protected]> * Update tools/ocean/divandfull.xml Co-authored-by: Björn Grüning <[email protected]> * Delete tools/ocean/test-data/DIVAnd_netcdf_output.netcdf * cleaning up julia script --------- Co-authored-by: Björn Grüning <[email protected]> Co-authored-by: Bjoern Gruening <[email protected]>
- Loading branch information
1 parent
3b5d66e
commit e395cfe
Showing
3 changed files
with
351 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,211 @@ | ||
#Julia script | ||
|
||
############################### | ||
## DIVAndrun analsysis ## | ||
############################### | ||
import Pkg; | ||
using Pkg | ||
Pkg.status() | ||
|
||
### Import packages | ||
using DIVAnd | ||
using Dates | ||
using Printf | ||
# Getting the arguments from the command line | ||
args = ARGS | ||
|
||
# Import data | ||
if length(args) < 4 | ||
error("This tool needs at least 4 arguments") | ||
else | ||
netcdf_data = args[1] | ||
longmin = parse(Float64, args[2]) | ||
longmax = parse(Float64, args[3]) | ||
latmin = parse(Float64, args[4]) | ||
latmax = parse(Float64, args[5]) | ||
startdate = args[6] # yyyy,mm,dd | ||
enddate = args[7] | ||
varname = args[8] | ||
selmin = parse(Float64, args[9]) | ||
selmax = parse(Float64, args[10]) | ||
bathname = args[11] | ||
end | ||
|
||
## This script will create a climatology: | ||
# 1. ODV data reading. | ||
# 2. Extraction of bathymetry and creation of mask | ||
# 3. Data download from other sources and duplicate removal. | ||
# 4. Quality control. | ||
# 5. Parameter optimisation. | ||
# 6. Spatio-temporal interpolation with DIVAnd. | ||
|
||
|
||
### Configuration | ||
# Define the horizontal, vertical (depth levels) and temporal resolutions. | ||
# Select the variable of interest | ||
|
||
dx, dy = 0.125, 0.125 | ||
lonr = longmin:dx:longmax | ||
latr = latmin:dy:latmax | ||
|
||
# Convert string in date | ||
startdate = Date(startdate, "yyyy-mm-dd") | ||
|
||
# extract year month day | ||
startyear = year(startdate) | ||
startmonth = month(startdate) | ||
startday = day(startdate) | ||
|
||
# Convert string in date | ||
enddate = Date(enddate, "yyyy-mm-dd") | ||
|
||
# extract year month day | ||
endyear = year(enddate) | ||
endmonth = month(enddate) | ||
endday = day(enddate) | ||
|
||
timerange = [Date(startyear, startmonth, startday),Date(endyear, endmonth, endday)]; | ||
|
||
depthr = [0.,5., 10., 15., 20., 25., 30., 40., 50., 66, | ||
75, 85, 100, 112, 125, 135, 150, 175, 200, 225, 250, | ||
275, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, | ||
800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, | ||
1300, 1350, 1400, 1450, 1500, 1600, 1750, 1850, 2000]; | ||
depthr = [0.,10.,20.]; | ||
|
||
varname = varname | ||
yearlist = [1900:2023]; | ||
monthlist = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]; | ||
|
||
# We create here the variable TS (for "tDataset(netcdf_data,"r")ime selector"), which allows us to work with the observations corresponding to each period of interest. | ||
|
||
TS = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist); | ||
@show TS; | ||
|
||
figdir = "outputs/" | ||
if ~(isdir(figdir)) | ||
mkdir(figdir) | ||
else | ||
@info("Figure directory already exists") | ||
end | ||
### 1. Read your ODV file | ||
# Adapt the datadir and datafile values. | ||
# The example is based on a sub-setting of the Mediterranean Sea aggregated dataset. | ||
# The dataset has been extracted around the Adriatic Sea and exported to a netCDF using Ocean Data | ||
datadir = "../data" | ||
|
||
datafile = netcdf_data | ||
|
||
# Then you can read the full file: | ||
@time obsval,obslon,obslat,obsdepth,obstime,obsid = NCODV.load(Float64, datafile, | ||
"Water body $(varname)"); | ||
|
||
# Check the extremal values of the observations | ||
checkobs((obslon,obslat,obsdepth,obstime),obsval,obsid) | ||
|
||
### 2. Extract the bathymetry | ||
|
||
# It is used to delimit the domain where the interpolation is performed. | ||
## 2.1 Choice of bathymetry | ||
|
||
# Modify bathname according to the resolution required. | ||
|
||
@time bx,by,b = load_bath(bathname,true,lonr,latr); | ||
|
||
## 2.2 Create mask | ||
# False for sea | ||
# True for land | ||
|
||
mask = falses(size(b,1),size(b,2),length(depthr)) | ||
for k = 1:length(depthr) | ||
for j = 1:size(b,2) | ||
for i = 1:size(b,1) | ||
mask[i,j,k] = b[i,j] >= depthr[k] | ||
end | ||
end | ||
end | ||
@show size(mask) | ||
|
||
### 3. Quality control | ||
# We check the salinity value. | ||
# Adapt the criteria to your region and variable. | ||
|
||
sel = (obsval .<= selmax) .& (obsval .>= selmin); | ||
|
||
obsval = obsval[sel] | ||
obslon = obslon[sel] | ||
obslat = obslat[sel] | ||
obsdepth = obsdepth[sel] | ||
obstime = obstime[sel] | ||
obsid = obsid[sel]; | ||
|
||
### 4. Analysis parameters | ||
# Correlation lengths and noise-to-signal ratio | ||
|
||
# We will use the function diva3D for the calculations. | ||
# With this function, the correlation length has to be defined in meters, not in degrees. | ||
|
||
sz = (length(lonr),length(latr),length(depthr)); | ||
lenx = fill(100_000.,sz) # 100 km | ||
leny = fill(100_000.,sz) # 100 km | ||
lenz = fill(25.,sz); # 25 m | ||
len = (lenx, leny, lenz); | ||
epsilon2 = 0.1; | ||
|
||
### Output file name | ||
outputdir = "outputs_netcdf/" | ||
if !isdir(outputdir) | ||
mkpath(outputdir) | ||
end | ||
filename = joinpath(outputdir, "Water_body_$(replace(varname," "=>"_")).nc") | ||
|
||
### 7. Analysis | ||
# Remove the result file before running the analysis, otherwise you'll get the message | ||
if isfile(filename) | ||
rm(filename) # delete the previous analysis | ||
@info "Removing file $filename" | ||
end | ||
|
||
## 7.1 Plotting function | ||
# Define a plotting function that will be applied for each time index and depth level. | ||
# All the figures will be saved in a selected directory. | ||
|
||
function plotres(timeindex,sel,fit,erri) | ||
tmp = copy(fit) | ||
nx,ny,nz = size(tmp) | ||
for i in 1:nz | ||
figure("Additional-Data") | ||
ax = subplot(1,1,1) | ||
ax.tick_params("both",labelsize=6) | ||
ylim(39.0, 46.0); | ||
xlim(11.5, 20.0); | ||
title("Depth: (timeindex)", fontsize=6) | ||
pcolor(lonr.-dx/2.,latr.-dy/2, permutedims(tmp[:,:,i], [2,1]); | ||
vmin = 33, vmax = 40) | ||
colorbar(extend="both", orientation="vertical", shrink=0.8).ax.tick_params(labelsize=8) | ||
|
||
contourf(bx,by,permutedims(b,[2,1]), levels = [-1e5,0],colors = [[.5,.5,.5]]) | ||
aspectratio = 1/cos(mean(latr) * pi/180) | ||
gca().set_aspect(aspectratio) | ||
|
||
figname = varname * @sprintf("_%02d",i) * @sprintf("_%03d.png",timeindex) | ||
plt.savefig(joinpath(figdir, figname), dpi=600, bbox_inches="tight"); | ||
plt.close_figs() | ||
end | ||
end | ||
|
||
## 7.2 Create the gridded fields using diva3d | ||
# Here only the noise-to-signal ratio is estimated. | ||
# Set fitcorrlen to true to also optimise the correlation length. | ||
@time dbinfo = DIVAnd.diva3d((lonr,latr,depthr,TS), | ||
(obslon,obslat,obsdepth,obstime), obsval, | ||
len, epsilon2, | ||
filename,varname, | ||
bathname=bathname, | ||
fitcorrlen = false, | ||
niter_e = 2, | ||
surfextend = true | ||
); | ||
|
||
# Save the observation metadata in the NetCDF file. | ||
DIVAnd.saveobs(filename,(obslon,obslat,obsdepth,obstime),obsid); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
<tool id="divand_full_analysis" name="DIVAnd" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.01" license="MIT"> | ||
<description>Data-Interpolating Variational Analysis in n dimensions</description> | ||
<macros> | ||
<token name="@TOOL_VERSION@">0.1.0</token> | ||
<token name="@VERSION_SUFFIX@">0</token> | ||
</macros> | ||
<requirements> | ||
<requirement type="package" version="1.8.5">julia</requirement> | ||
<requirement type="package" version="2.7.9">julia-divand</requirement> | ||
</requirements> | ||
<command detect_errors="exit_code"><![CDATA[ | ||
## The HOME .julia folder is not writable inside the Docker container, so we need to set one that is writable. | ||
export JULIA_DEPOT_PATH="\$PWD:\$JULIA_DEPOT_PATH" && | ||
julia | ||
'$__tool_directory__/divandfull.jl' | ||
'$input_netcdf_identifier' | ||
'$longmin' | ||
'$longmax' | ||
'$latmin' | ||
'$latmax' | ||
'$startdate' | ||
'$enddate' | ||
'$varname' | ||
'$selmin' | ||
'$selmax' | ||
'$bathname' | ||
]]></command> | ||
<inputs> | ||
<param name="input_netcdf_identifier" type="data" format="netcdf" label="Input your netcdf data"/> | ||
<param name="bathname" type="data" format="netcdf" label="Input your bathymetry netcdf file" help="for more info see below."/> | ||
<param name="longmin" type="float" min="-180" max="180" value="0" label="Longitude minimal"/> | ||
<param name="longmax" type="float" min="-180" max="180" value="0" label="Longitude maximal"/> | ||
<param name="latmin" type="float" min="-180" max="180" value="0" label="Latitude minimal"/> | ||
<param name="latmax" type="float" min="-180" max="180" value="0" label="Latitude maximal"/> | ||
<param name="startdate" type="text" value="yyyy-mm-dd" label="Input the starting date"> | ||
<sanitizer invalid_char=""> | ||
<valid initial="string.digits"> | ||
<add value="-"/> | ||
</valid> | ||
</sanitizer> | ||
</param> | ||
<param name="enddate" type="text" value="yyyy-mm-dd" label="Input the ending date"> | ||
<sanitizer invalid_char=""> | ||
<valid initial="string.digits"> | ||
<add value="-"/> | ||
</valid> | ||
</sanitizer> | ||
</param> | ||
<param name="varname" type="text" value="variable" label="Write the name of the variable of the analysis" help="Example: phosphate"> | ||
<sanitizer invalid_char=""> | ||
<valid initial="string.letters"> | ||
<add value="_"/> | ||
</valid> | ||
</sanitizer> | ||
<validator type="regex">[0-9a-zA-Z_]+</validator> | ||
</param> | ||
<param name="selmin" type="integer" min="0" max="100" optional="true" value="0" label="Minimum of the salinity"/> | ||
<param name="selmax" type="integer" min="0" max="100" optional="true" value="0" label="Maximum of the salinity"/> | ||
</inputs> | ||
<outputs> | ||
<data name="output_netcdf" label="DIVAnd netcdf output" from_work_dir="outputs_netcdf/*.nc" format="netcdf"/> | ||
</outputs> | ||
<tests> | ||
<test expect_num_outputs="1"> | ||
<param name="input_netcdf_identifier" value="data_from_Eutrophication_Med_profiles_2022_unrestricted.nc"/> | ||
<param name="bathname" location="https://dox.ulg.ac.be/index.php/s/U0pqyXhcQrXjEUX/download"/> | ||
<param name="longmin" value="19.0"/> | ||
<param name="longmax" value="30.0"/> | ||
<param name="latmin" value="32.0"/> | ||
<param name="latmax" value="38.0"/> | ||
<param name="varname" value="phosphate"/> | ||
<param name="startdate" value="1950-01-01"/> | ||
<param name="enddate" value="2017-12-31"/> | ||
<param name="selmin" value="0"/> | ||
<param name="selmax" value="100"/> | ||
<output name="output_netcdf"> | ||
<assert_contents> | ||
<has_size value="68291" delta="0"/> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
</tests> | ||
<help><![CDATA[ | ||
.. class:: infomark | ||
**What it does** | ||
This tool takes a observation netcdf file and create climatology | ||
**Input** | ||
- An ocean observation netcdf file | ||
- A bathymetry netcdf file, you can download it like this: download("https://dox.ulg.ac.be/index.php/s/U0pqyXhcQrXjEUX/download", "gebco_30sec_8.nc") | ||
- Some complementary information for the tool to better understand your data and create your climatology on the right area: latitudes, longitudes, dates, and salinity.$ | ||
**Output** | ||
One netcdf file containing the climatology created by DIVAnd. | ||
**A bit of context** | ||
DIVAnd (Data-Interpolating Variational Analysis in n dimensions) performs an n-dimensional variational analysis/gridding of | ||
arbitrarily located observations. Observations will be interpolated/analyzed on a curvilinear grid in 1, 2, 3 or more dimensions. | ||
In this sense it is a generalization of the original two-dimensional DIVA version (still available `here <https://github.com/gher-uliege/DIVA>`_ but | ||
not further developed anymore). | ||
The method bears some similarities and equivalences with Optimal Interpolation or Krigging in that it allows to create a smooth | ||
and continous field from a collection of observations, observations which can be affected by errors. The analysis method is however | ||
different in practise, allowing to take into account topological features, physical constraints etc in a natural way. | ||
The method was initially developped with ocean data in mind, but it can be applied to any field where localized observations have | ||
to be used to produce gridded fields which are "smooth". | ||
DIVAndrun is the core analysis function in n dimensions. It does not know anything about the physical parameters or units you work with. | ||
Coordinates can also be very general. The only constraint is that the metrics (pm,pn,po,...) when multiplied by the corresponding length | ||
scales len lead to non-dimensional parameters. Furthermore the coordinates of the output grid (xi,yi,zi,...) need to have the same units | ||
as the observation coordinates (x,y,z,...). | ||
DIVAndfun is a version with a minimal set of parameters (the coordinates and values of observations, i.e. (x,f), the remaining parameters | ||
being optional) and provides an interpolation function rather than an already gridded field. | ||
diva3D is a higher-level function specifically designed for climatological analysis of data on Earth, using longitude/latitude/depth/time | ||
coordinates and correlations length in meters. It makes the necessary preparation of metrics, parameter optimizations etc you normally would | ||
program yourself before calling the analysis function DIVAndrun. | ||
DIVAnd_heatmap can be used for additive data and produces Kernel Density Estimations. | ||
DIVAndgo is only needed for very large problems when a call to DIVAndrun leads to memory or CPU time problems. This function tries to decide | ||
which solver (direct or iterative) to use and how to make an automatic domain decomposition. Not all options from DIVAndrun are available. | ||
If you want to try out multivariate approaches, you can look at DIVAnd_multivarEOF and DIVAnd_multivarJAC | ||
If you want more informations about the functions and parameters see also the `documentations here <https://gher-uliege.github.io/DIVAnd.jl/latest/index.html>`_. | ||
]]></help> | ||
<citations> | ||
<citation type="doi">doi:10.5194/gmd-7-225-2014</citation> | ||
</citations> | ||
</tool> |
Binary file added
BIN
+452 KB
tools/ocean/test-data/data_from_Eutrophication_Med_profiles_2022_unrestricted.nc
Binary file not shown.