Skip to content

Commit

Permalink
diag files
Browse files Browse the repository at this point in the history
  • Loading branch information
paocorrales committed Jan 26, 2024
1 parent dc48487 commit 622cbb8
Showing 1 changed file with 181 additions and 3 deletions.
184 changes: 181 additions & 3 deletions content/gsi/04-diagfiles.qmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,185 @@
---
title: "Undestanding diag files"
title: "Undestanding diag(nostic) files"
---

Scrips para decodificar
The `diag_<obs>` files save the key information of how the observations where assimilated (with the variational method) or are going to be assimilated (using Kalman Filter), including the innovation (observation minus background), observation values, observation error and adjusted observation error, and quality control information.

It is important to check that the `write_diag` option in the GSI namelist is setup to `.true.`.

By default the diag files are saved in binary format, and while GSI should be able to saved them as netcdfs, I have never made it work. So, this section will concentrate on how to decode the binary files and how to interpret the information.

Here is a list of the files you get if you run GSI as observation operator, in this case we are assimilating AMSU-A observation from the NOAA-18 and METOP-A satellites, ABI observations from GOES-16 and conventional observations. To be able to use this output with the Kalman Filter methods we need the diag files for the ensemble mean and every member.

```bash
diag_abi_g16_ges.ensmean
diag_abi_g16_ges.mem001
diag_abi_g16_ges.mem002
diag_abi_g16_ges.mem003
diag_abi_g16_ges.mem004
diag_abi_g16_ges.mem005
diag_abi_g16_ges.mem006
diag_abi_g16_ges.mem007
diag_abi_g16_ges.mem008
diag_abi_g16_ges.mem009
diag_abi_g16_ges.mem010
diag_amsua_metop-a_ges.ensmean
diag_amsua_metop-a_ges.mem001
diag_amsua_metop-a_ges.mem002
diag_amsua_metop-a_ges.mem003
diag_amsua_metop-a_ges.mem004
diag_amsua_metop-a_ges.mem005
diag_amsua_metop-a_ges.mem006
diag_amsua_metop-a_ges.mem007
diag_amsua_metop-a_ges.mem008
diag_amsua_metop-a_ges.mem009
diag_amsua_metop-a_ges.mem010
diag_amsua_n18_ges.ensmean
diag_amsua_n18_ges.mem001
diag_amsua_n18_ges.mem002
diag_amsua_n18_ges.mem003
diag_amsua_n18_ges.mem004
diag_amsua_n18_ges.mem005
diag_amsua_n18_ges.mem006
diag_amsua_n18_ges.mem007
diag_amsua_n18_ges.mem008
diag_amsua_n18_ges.mem009
diag_amsua_n18_ges.mem010
diag_conv_ges.ensmean
diag_conv_ges.mem001
diag_conv_ges.mem002
diag_conv_ges.mem003
diag_conv_ges.mem004
diag_conv_ges.mem005
diag_conv_ges.mem006
diag_conv_ges.mem007
diag_conv_ges.mem008
diag_conv_ges.mem009
diag_conv_ges.mem010
```

GSI includes some fortran routines you can use to decode the binary files. In may case I decided to modify those routines to get the information as a tidy table (1 observation per row, variables in columns) and to include more details present in the diagfiles.

The code is publish in this repository, that includes a version of GSI with some modidications:

```bash
read_diag/
├── convinfo
├── namelist.conv
├── namelist.rad
├── read_diag_conv_mean.sh
├── read_diag_conv.sh
├── read_diag_conv.x
├── read_diag_rad_mean.sh
├── read_diag_rad.sh
├── read_diag_rad.x
└── src
├── compile_gcc
├── compile_ifort
├── read_diag_conv.f90
├── read_diag_conv.f90_original
├── read_diag_rad.f90
└── read_diag_rad.f90_original
```

There are 2 fortran routines, `read_diag_conv.f90` for conventional diag files and `read_diag_rad.f90` for radiances. To compile the routines it is necessary to link them with the libraries that GSI uses. An example of how to compile the code can be found in `compile_gcc` and `compile_ifort`.

The resulting executables are `read_diag_conv.x` and `read_diag_rad.x`. Each one is asociated to a namelist that you need to modify each time in order to run the code and decode an especific diagfile. See for example the content of `namelist.conv`:

```bash
&iosetup
infilename='/home/paola.corrales/datosmunin3/EXP/E6_long/ANA/20181112220000/diagfiles/diag_conv_ges.ensmean',
outfilename='/home/paola.corrales/datosmunin3/EXP/E6_long/ANA/20181112220000/diagfiles/asim_conv_20181112220000.ensmean',
/
```

The namelist is very simple, it only need the path to the diag file and the path to the output: a plain text file. But if you need to do this for every diag file, it is very time consuming. For that reason I wrote in bash some loops to go through all the diagfiles and decode them automatically. There are 4 bash files, to decode conventional diagfiles (ensemble mean or the members of the ensemble) and 2 for the radiance diagfiles. I've also kept the original fortran routines just in case.

### Conventional obs

This is the information you get when you decode a conventional diagfile using the `read_diag_conv.x`:

* variable
* stationID
* type (acording to the prepbufr)
* dhr (difference between the obserbation time and the analysis time)
* latitude
* longitude
* pressure
* usage flag (defined by gsi)
* usage flag preprepbufr
* observation
* observation minus guess
* observation (only of uv)
* observation minus guess (only of uv)
* observation error

Each row is an observation, except for wind that has u and v components in the same row.

```bash
ps @ SCVD : 187 -0.50 -39.61 286.94 0.101E+04 1 0 0.101E+04 -0.142E+01 0.100E+11 0.181E+03 0.161E+01
ps @ 85782 : 181 -0.50 -40.60 286.95 0.999E+03 1 0 0.999E+03 -0.176E+01 0.100E+11 0.181E+03 0.227E+01
ps @ 85766 : 181 -0.50 -39.65 287.92 0.101E+04 1 0 0.101E+04 -0.139E+00 0.100E+11 0.187E+03 0.295E+01
t @ 85782 : 181 -0.50 -40.60 286.95 0.999E+03 1 0 0.291E+03 0.242E+01 0.100E+11 0.181E+03 0.192E+01
t @ 85766 : 181 -0.50 -39.65 287.92 0.101E+04 1 0 0.291E+03 0.496E+01 0.100E+11 0.187E+03 0.100E+11
t @ SCJO : 187 -0.50 -40.60 286.95 0.997E+03 1 0 0.291E+03 0.212E+01 0.100E+11 0.000E+00 0.150E+01
q @ SCVD : 187 -0.50 -39.61 286.94 0.100E+04 1 0 0.113E-01 0.102E-02 0.114E-01 0.100E+11 0.228E-02
q @ 85782 : 181 -0.50 -40.60 286.95 0.999E+03 1 0 0.105E-01 0.946E-03 0.112E-01 0.100E+11 0.229E-02
q @ 85766 : 181 -0.50 -39.65 287.92 0.101E+04 1 0 0.110E-01 0.122E-02 0.995E-02 0.100E+11 0.988E-02
q @ SCJO : 187 -0.50 -40.60 286.95 0.999E+03 1 0 0.107E-01 0.110E-02 0.112E-01 0.100E+11 0.225E-02
uv @ IR270 : 245 0.00 -39.36 285.57 0.270E+03 -1 100 0.434E+02 -0.363E+01 -0.193E+02 0.614E+00 0.684E+01
uv @ IR270 : 245 0.00 -39.31 286.11 0.269E+03 -1 100 0.434E+02 -0.225E+01 -0.193E+02 -0.108E+01 0.685E+01
```

The diag file includes more details. It may be useful to read the subroutine that write the diag files. The subroutine is called `contents_binary_diag_` and is present in each `setup*.f90` file. But here is a tip, it is much easier to read the `contents_netcdf_diag_` subroutine because it mention the name of each variable to create the metadata of the netcdf file.

### Radiance obs

For radiances we'll get a diag file for each sensor and satellite. But the structure of the binary files is always the same. Here is the list of variable I save from the diagfiles:

* sensor
* channel
* frequency
* latitude
* longitude
* elevation at observation location according guess (mb)
* pressure at max of weighting function (mb)
* dhr (difference between the observation time and the analysis time)
* observation (BT)
* observation minus guess with bias correction
* observation minus guess without bias correction
* inverse observation error
* quality control flag
* emissivity from surface
* stability index
* satellite zenith angle (degrees)
* satellite azimuth angle (degrees)
* fractional coverage by land
* fractional coverage by ice
* fractional coverage by ice
* cloud fraction (%)
* cloud top pressure (hPa)
* predictor 1
* predictor 2
* predictor 3
* predictor 4
* predictor 5
* predictor 6
* predictor 7
* predictor 8
* predictor 9
* predictor 10
* predictor 11
* predictor 12

Again, it is worth cheeking the subroutine that write the diagfiles for radiances, in case there are other details you need to include in the decodification.

`read_diag_rad.x` will write a plain text file with all the variables listed above for each observation (from each satellite/sensor/channel).

### Important information in the diagfiles

While all variables included in the diagfiles are necessary for the assimilation, there are a few that I found particularly important to monitor the assimilation process:

* **observation minus guess:** this variable should have a normal distribution centered in zero. If a bias correction was perform, it is important to compare the distribution with and without bias correction.
* **quality control flag:** if this variable is not zero, the observation will be rejected during the assimilation. To understand why this happened you need to check the GSI code and find the corresponding qc value. It will be different for each type of observation.
* **error:** this variable is also used to decide if the observation will be assimilated or not. If the value is to big (and remember, GSI changes the value of the error depending on the quality of the observation), it means that the observation is not good.

rutina que los genera

0 comments on commit 622cbb8

Please sign in to comment.