Exercise_09_KalmanFilter.Rmd

Kalman Filter
========================================================

```{r, echo=FALSE}
#devtools::install_github("EcoForecast/ecoforecastR")
library(ecoforecastR)
```


In this exercise we will apply the classic Kalman Filter (KF) algorithm to the Google Flu Trends data we previously used to explore the state-space model. Unlike the previous exercise that fit a single time-series, we'll utilize the matrix version of the Kalman filter to look at the flu across New England. 

In the multivariate version of the KF, the connection between state variables in the Analysis step is provided in two ways: (1) through interactions in the process model itself, $MPM^T$, and (2) through the covariance in the process error, $Q$. In this assignment we'll assimilate all 4 combinations of with/without interactions in the process model versus with/without correlation in the process error to evaluate how each impacts the inferences made. Since the KF will always be following the data, where we'll see the largest impact of these choices will be in the differences in the state uncertainties, especially in the periods of missing data.

## The data and model: Flu in New England

To begin, let's load and plot the flu data for New England.

```{r}
## load the Google flu data & select states
gflu = read.csv("data/gflu_data.txt",skip=11)
time = as.Date(gflu$Date)
states = c("Massachusetts","Connecticut","Rhode.Island","New.Hampshire","Vermont","Maine")
nstates = length(states)
y = t(gflu[,states])

## plot time-series from states
plot(time,1:length(time),type='n',ylab="Flu Index",lwd=2,log='y',ylim=range(y,na.rm=TRUE))
for(i in 1:nstates){
  lines(time,y[i,],col=i,lwd=2)
}
legend("topleft",legend=states,lwd=2,col=1:nstates)
```

Kalman does not estimate parameters, so we will used parameters that were previously estimated by fitting a state space model to the data. In a real-world situation you wouldn't fit two models to the same data (double dipping!), but rather you could fit a state-space model to the previous data and then use an operational forecast moving forward. Alternatively, you might augment the state matrix in the KF to include both the model states and the model parameters. However, for the classic KF, this approach is limited to only being able to estimate parameters that can be written as linear models of the augmented state + variable matrix M. Therfore, you are limited to estimating variables in the process model, f(X), not the parameters in the Observation Error or Process Error matrices. For the Kalman Filter exercise today we will be using estimates of these variance parameters, not the states, to inform the KF. Keep in mind that the KF is now treating these as KNOWN and thus ignoring parameter uncertainty.

In our previous model we assumed a Random Walk which we just fit Massachussetts. For this version we'll keep working with a Random Walk but we'll need to add a spatial contagious process to the random-walk process model. In other words, we're positing that part of the reason that we see such strong correlations across-states is that infected individuals are able to move across state boundaries and infect individuals in adjacent states. To run such a model we'll need to define a matrix that defines the adjacency between states, where 1 = adjacent, 0 = not adjacent, and the states are in the order: `r states`.

```{r}
## define adjacency between states slected
adj = matrix(c(0,1,1,1,1,0,    ### state-to-state spatial adjacency (self=0)
               1,0,1,0,0,0,
               1,1,0,0,0,0,
               1,0,0,0,1,1,
               1,0,0,1,0,0,
               0,0,0,1,0,0),nstates,nstates,byrow=TRUE)

```

To be more specific, lets assume a simple flux process just based on adjacency, and ignore differences in how population size, border length, transporation corridors, etc. affect the movement of individuals among the New England states.

$X_{i,t+1} = X_{i,t} + \alpha*\sum(adj_{i,j}*(X_{j,t}-X_{i,t}))+\epsilon_{i,t}$

Thus, if state j has more cases than state i, this will tend to increase infection in state i. For your reference, below is the JAGS model fit to the log-transformed flu data

```{r}
SpatialRandomWalk = "
model{

  #### Data Model
  for(t in 1:n){
    for(i in 1:nstate){
      y[i,t] ~ dnorm(x[i,t],tau_obs)
    }
  }

  #### Process Model
  for(t in 2:n){
    for(i in 1:nstate){
      mu[i,t] <- x[i,t-1] + alpha * sum(adj[i,1:nstate]*x[1:nstate,t-1])
    }
    x[1:nstate,t] ~ dmnorm(mu[1:nstate,t],Omega_proc)
  }
  
  #### Priors
  for(i in 1:nstate){
    x[i,1] ~ dnorm(x_ic,tau_ic)
  }
  tau_obs ~ dgamma(a_obs,r_obs)
  Omega_proc ~ dwish(R,k)
  alpha ~ dbeta(1,20)
}
"
```

And the parameters estimated from the model

```{r}
## load parameters (assume known)
load("data/KFalpha.params.Rdata")
## observation error
tau_obs
## process error covariance
knitr::kable(tau_proc,col.names = states)
## process error correlation
knitr::kable(cov2cor(tau_proc),col.names = states)
## process error SD
sqrt(diag(tau_proc))
```

## Kalman Filter equations and functions

Now that we have estimates for our parameters, let's write functions that evaluates the classic Kalman Filter. Note, if you were running the KF in 'operational' mode, where new data is arriving in real time, you wouldn't run the function all at once but rather just call the KalmanAnalysis every time new data is observed, followed by KalmanForecast to make a new forecast.

```{r}
##'  Kalman Filter
##' @param  M   = model matrix
##' @param  mu0 = initial condition mean vector
##' @param  P0  = initial condition covariance matrix
##' @param  Q   = process error covariance matrix
##' @param  R   = observation error covariance matrix
##' @param  Y   = observation matrix (with missing values as NAs), time as col's
##'
##' @return list
##'  mu.f, mu.a  = state mean vector for (a)nalysis and (f)orecast steps
##'  P.f, P.a    = state covariance matrix for a and f
KalmanFilter <- function(M,mu0,P0,Q,R,Y){
  
  ## storage
  nstates = nrow(Y)  
  nt = ncol(Y)
  mu.f  = matrix(NA,nstates,nt+1)  ## forecast mean for time t
  mu.a  = matrix(NA,nstates,nt)  ## analysis mean for time t
  P.f  = array(NA,c(nstates,nstates,nt+1))  ## forecast variance for time t
  P.a  = array(NA,c(nstates,nstates,nt))  ## analysis variance for time t

  ## initialization
  mu.f[,1] = mu0
  P.f[,,1] = P0
  I = diag(1,nstates)

  ## run updates sequentially for each observation.
  for(t in 1:nt){

    ## Analysis step: combine previous forecast with observed data
    KA <- KalmanAnalysis(mu.f[,t],P.f[,,t],Y[,t],R,H=I,I)
    mu.a[,t] <- KA$mu.a
    P.a[,,t] <- KA$P.a
    
    ## Forecast step: predict to next step from current
    KF <- KalmanForecast(mu.a[,t],P.a[,,t],M,Q)
    mu.f[,t+1] <- KF$mu.f
    P.f[,,t+1] <- KF$P.f
  }
  
  return(list(mu.f=mu.f,mu.a=mu.a,P.f=P.f,P.a=P.a))
}

##' Kalman Filter: Analysis step
##' @param  mu.f = Forecast mean (vector)
##' @param  P.f  = Forecast covariance (matrix)
##' @param  Y    = observations, with missing values as NAs) (vector)
##' @param  R    = observation error covariance (matrix)
##' @param  H    = observation matrix (maps observations to states)
KalmanAnalysis <- function(mu.f,P.f,Y,R,H,I){
  obs = !is.na(Y) ## which Y's were observed?
  if(any(obs)){
    H <- H[obs,]                                              ## observation matrix
    K <- P.f %*% t(H) %*% solve(H%*%P.f%*%t(H) + R[obs,obs])  ## Kalman gain
    mu.a <- mu.f + K%*%(Y[obs] - H %*% mu.f)                  ## update mean
    P.a <- (I - K %*% H)%*%P.f                                ## update covariance
    ## Note: Here's an alternative form that doesn't use the Kalman gain
    ## it is less efficient due to the larger number of matrix inversions (i.e. solve)
    ## P.a <- solve(t(H)%*%solve(R[obs,obs])%*%(H) + solve(P.f))                             
    ## mu.a <- P.a %*% (t(H)%*%solve(R[obs,obs])%*%Y[obs] + solve(P.f)%*%mu.f)
  } else {
    ##if there's no data, the posterior is the prior
    mu.a = mu.f
    P.a = P.f
  }
  return(list(mu.a=mu.a,P.a=P.a))
}

##' Kalman Filter: Forecast Step
##' @param mu.a = analysis posterior mean (vector)
##' @param P.a  = analysis posterior covariance (matrix)
##' @param M    = model (matrix)
##' @param  Q   = process error covariance (matrix)
KalmanForecast <- function(mu.a,P.a,M,Q){
  mu.f = M%*%mu.a
  P.f  = Q + M%*%P.a%*%t(M)
  return(list(mu.f=mu.f,P.f=P.f))
}
```

## Applying the Kalman Filter to the Flu data

With the Kalman Filter function defined, we need to define the inputs to the function. Note below that I'm using the variable KF00 to store the outputs, where I'm using 00 to indicate that this run was done with the defaults for both the process model and process error covariance. In the assignment below you will rerun this analysis under a number of alternatives varying the process error and the magnitude of spatial flux in the process model.

```{r}
## log transform data
Y   = log10(y)

## options for process model 
alpha = 0        ## assume no spatial flux
#alpha = 0.05    ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum))  ## random walk with flux

## options for process error covariance
#Q = tau_proc            ## full process error covariance matrix
Q = diag(diag(tau_proc))        ## diagonal process error matrix

## observation error covariance (assumed independent)  
R = diag(tau_obs,nstates) 

## prior on first step, initialize with long-term mean and covariance
mu0 = apply(Y,1,mean,na.rm=TRUE)
P0 = cov(t(Y),use="pairwise.complete.obs")
#w <- P0*0+0.25 + diag(0.75,dim(P0)) ## iptional: downweight covariances in IC
#P0 = P0*w 

## Run Kalman Filter
KF00 = KalmanFilter(M,mu0,P0,Q,R,Y)
```

## Visualizing Outputs

After running the Kalman Filter, we can visualize the outputs. The first set of figures below shows the posterior analysis for each state through time. The second set shows the forecast and analysis standard deviations change through time, indicating when there is missing data in green on the bottom of the plot. As you can see the missing data is not synchronous across states, but the mean of the Analysis is influenced by the across-state covariances.

```{r, fig.asp=1.0}
attach(KF00)
nt = length(time)

### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
  ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
  plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
  ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
  lines(time,mu.a[i,],col=4)
  lines(time,Y[i,])
}

## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
  plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
       ylab="Std Error",type='l')
  lines(time,sqrt(P.f[i,i,1:nt]),col=2)
  points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
  legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}
```

Finally, to get a better idea about the dynamics of how the Kalman Filter works we can zoom in to a subset of time for one state and show the Forecast, Analysis, and observed data altogether.

```{r}
## subset time
time2 <- time[time>as.Date("2015-01-01")]
tsel <- which(time %in% time2)
n = length(time2)*2

## interleave Forecast and Analysis
mu = p = rep(NA,n)
mu[seq(1,n,by=2)] = mu.f[1,tsel]
mu[seq(2,n,by=2)] = mu.a[1,tsel]
p[seq(1,n,by=2)]  = 1.96*sqrt(P.f[1,1,tsel])
p[seq(2,n,by=2)]  = 1.96*sqrt(P.a[1,1,tsel])
ci = cbind(mu-p,mu+p)
time3 = sort(c(time2,time2+1))

## plot Forecast, Analysis, and data
plot(time3,mu,ylim=range(ci),type='l')
ecoforecastR::ciEnvelope(time3,ci[,1],ci[,2],col="lightBlue")
lines(time3,mu,lwd=2)
points(time,Y[1,])
```

## Assignment

Run the KF under all four combinations of covariance in the process model versus process error and compare the results. In particular you'll want to pay attention to the missing data at the beginning of the timeseries for some states. You'll also want to comment on how spatial adjacency affects the confidence in the inferences (some states are more isolated than others) in the four different scenarios. Finally, you'll want to note that the alpha estimated from the data itself (0.000209), is close to zero and thus our real forecast would be much more like our no-flux run than our high flux run.

Task 1, KF00: Run the default analysis, KF00, with no spatial flux in the process model, and no spatial covariance in the process error

Task 2, KF01: Rerun with process error set to the full covariance matrix of Q, compare the results with the original -- what impact does including covariance in the process error have on the inference?

Task 3, KF10: Rerun with alpha = 0.05 but switch back to the *diagonal* Q matrix (no spatial covariance). Comparing KF10 to KF00, what impact does including a spatial flux in the process model have on the inference?

Task 4, KF11: Rerun with alpha = 0.05 and the full process error covariance Q matrix. Compare KF11 to the previous runs -- what impact does including both a spatial process and a process error covariance have over their impacts individually.

Task 5: In a true forecasting situation you don't have all the data in hand at once. You also often want to make a forecast that is farther than one time-step into the future. **Write an R function you could run daily** that would:

* Take the previous forecast and the new data (for just that day) as inputs (plus any other parameters you may need)
* Assimilates the new data
* Makes a forecast 16 time steps into the future
* Returns a list that starts from (& includes) the current best estimate (i.e. nowcast) and the forecast for the next 16 time steps. This list should include both means (mu) and covariances (P), but shouldn't need to include separate mu.a/mu.f and P.a/P.f objects.

You should leverage the existing `KalmanAnalysis` and `KalmanForecast` functions in your new function, and you shouldn't need to change anything in or about those functions.

Note: in a real world situation, where your forecast model has driver/covariate data, you would actually want to first re-run the forecast from yesterday to today with the actual (now observed) driver/covariate data, rather than using your archived forecast (which was done based on your forecasted covariate/driver data), before assimilating today's new observations of your response data.

Extra Credit:
Using the run with alpha=0.05 and full process error covariance Q, apply your forecast function to make, and visualize, 5 iterative forecasts. Hint: when visualizing, remember that each forecast starts from a different day.