Skip to content

Example. Analysis and prediction of circulation weather types

juanferngran edited this page Feb 12, 2020 · 3 revisions

Analysis and Prediction of Circulation Weather Types (CWTs)

This is a working example of the application of analysis and prediction of CTs and WTs. In the next example, the clusters of a subset of the GCM CMIP5 dataset will be firstly obtained, considering the Sea Pressure Level (psl), Near-Surface Air Temperature (tas) and Specific Humidity at a height of 850m (hus850) over the Iberia peninsula in the winter season during the period of 1983-2002 (arg. grid). This will constitute an analysis of Circulation Types from training data. Later on, the clusters of the same subset of the GCM CMIP5 but in the future time domain 2081-2100 (arg. newdata) will be predicted referring to the training analysis. With this, the CTs of Iberia peninsula in the future time domain can be predicted, and it can be analyzed the change of the frequency of appearance of each CT considering both time domains.

#Data for training 
grid <- makeMultiGrid(CMIP5_Iberia_psl, CMIP5_Iberia_tas, CMIP5_Iberia_hus850)
#Data for prediction
newdata <- makeMultiGrid(CMIP5_Iberia_psl.rcp85, CMIP5_Iberia_tas.rcp85, CMIP5_Iberia_hus850.rcp85)

All the used datasets are included in transformeR package. Now that grid and newdata inputs are ready, the clustering analysis for the training data can be performed. It is considered k-means algorithm and centers = 10 for this example:

clusters.training <- clusterGrid(grid = grid, type = "kmeans", centers = 10, iter.max = 10000, nstart = 10)

Circulation types are obtained in the training clustering just performed. The resulting CTs are stored in attributes wt.index and centroids. Further information about the clustering algorithm used k-means can be found in other attributes. The absolute frequency of the CTs can be seen by running the following code:

wt.index <- attr(clusters.training, "wt.index")
table(wt.index)
# wt.index
#  1   2   3   4   5   6   7   8   9  10 
# 215 221 135 241 190  89 269 117  82 246 

It can be observed that the most frequent CT is number 7 happening 269 days out of 1805 from the dataset, and the least frequent CT is CT number 9 (82 days out of 1805).

The prediction of CT in future data can be performed after the training CT are computed. This second step is also executed with clusterGrid. A clustering analyzed grid must be inputted in grid argument, otherwise the function will return an error message. The input grid newdata must be lat,lon and season consistent with grid. Variables among input grids must be consistent too. Otherwise, the function will return an error message.

clusters.prediction <- clusterGrid(grid = clusters.training, newdata = newdata, centers = attr(clusters.training, "centers"))

The absolute frequency of the predicted CTs are now analyzed:

wt.index2 <- attr(clusters.prediction, "wt.index")
table(wt.index2)
# wt.index2
#   1   2   3   4   5   6   7   8   9  10 
# 228 237 179 206 160 109 286  97  59 243

A plot representing the Absolute Frequencies of the CTs for the training and prediction data can be generated with the following code in order to give a visual idea of the outcome of this experiment:

t <- table(wt.index); t2 <- table(wt.index2)
plot(t,ylim=c(0,300), ty = "h", col = 155, xlab = "Circulation Type ID", ylab = "Freq.")
lines(t2, ty = "b", col = 60)
legend("topleft", legend = c("Training","Prediction"), col = c(155,60), bty = "n", pch=20 , pt.cex = 2, cex = 0.8, horiz = FALSE, inset = c(0.05, 0.05))
title(main = "Absolute Frequencies of Circulation Types")

It can be notice that CT number 7 is the most frequent for the predicted period and its frequency increased comparing to the training period. This CT can be extracted from the output grid clusters.prediction for further analysis by using the function subsetGrid with the cluster argument:

CT7 <- subsetGrid(grid = clusters.prediction, cluster = 7)