040-RegressionFixedCoeff.Rmd

# Regression Model {#regressionModel}
This model adds an explanatory variable
The variable has a coefficient, $\lambda$,
which can be either *fixed* or *time-varying*.

## Introduction
### Fixed coefficient
In this version of the regression model,
the coefficient is "fixed" in the sense that it does not vary over time.

\begin{eqnarray*}
  y_{t} & = & \mu_{t} + \lambda x_{t} + \epsilon_{t}, \qquad \epsilon_{t} \sim N(0, \sigma_{\epsilon}^{2}) \\
  \mu_{t} & = & \mu_{t-1} + \xi_{t}, \qquad \xi_{t} \sim N(0,\sigma_{\xi}^{2}) \\
\end{eqnarray*}

The state vector is $\alpha_{t} = (\mu_{t}, \lambda)^{\top}$.

This is a four-parameter model.

-----------------------  ---------------------------------------------
$\sigma_{\epsilon}^2$    Variance of observation errors, $\epsilon$
$\sigma_{\xi}^2$         Variance of transition errors, $\xi$
$\mu_{0}$                Initial level of $\mu$
$\lambda$                Coefficient of $x$
-----------------------  ---------------------------------------------

### Time-varying coefficient

This formulation is similar to the previous one,
but now the coefficient of the explanatory variable *does* vary over time.
The equational formulation is similar.
The difference is that the slope, $\lambda$,
becomes $\lambda_{t}$, subscripted by time.

\begin{eqnarray*}
  y_{t} & = & \mu_{t} + \lambda_{t}x_{t} + \epsilon_{t}, \qquad \epsilon_{t} \sim N(0, \sigma_{\epsilon}^{2}) \\
  \mu_{t} & = & \mu_{t-1} + \xi_{t}, \qquad \xi_{t} \sim N(0,\sigma_{\xi}^{2}) \\
  \lambda_{t} & = & \lambda_{t-1} + \zeta_{t}, \qquad \zeta_{t} \sim N(0,\sigma_{\zeta}^{2})
\end{eqnarray*}

The state vector is $\alpha_{t} = (\mu_{t}, \lambda_{t})^{\top}$,
where both components vary over time.

The $\lambda_{t}$ follow a random walk with error terms $\zeta_{t}$,
and that introduces a new parameter, $\sigma_{\zeta}^{2}$,
the variance of the errors.
The full set of five parameters is:

-----------------------  --------------------------------------------------
$\sigma_{\epsilon}^2$    Variance of observation errors, $\epsilon$
$\sigma_{\xi}^2$         Variance of transition errors, $\xi$
$\sigma_{\zeta}^2$       Variance of transition errors, $\zeta$
$\mu_{0}$                Initial level of $\mu$
$\lambda_{0}$            Initial level of $\lambda$
-----------------------  --------------------------------------------------

## Fitting a Regression Model, Fixed Coefficients {#fitRegressionFixed}

### Problem {-}
You have time series data and a regressor variable
(that is also a time series).
You want to build a regression model of the time series,
assuming the regression coefficient is fixed.

### Solution {-}
The solution will follow these steps.

1. Define a function that constructs a dlm model object
   from the four model parameters
   ($\sigma_{\epsilon}^2$, $\sigma_{\xi}^2$, $\mu_{0}$, and $\lambda$).
1. Choose some reasonable, initial guesses for the parameters.
1. Use the `dlmMLE` function
   to find the maximum likelihood estimates (MLE) for the model parameters.
1. Using those MLE parameters, construct the final dlm model object
   (using function you wrote in step #1).

This code embodies the solution.
The Discussion, below, unpacks and explains the logic.

```{r, eval=FALSE}
#
# Assume here:
#   y = time series data
#   x = regressor (also a time series)
#

# Argument v is a 4-element vector of model parameters
buildModReg <- function(v) {
  dV <- exp(v[1])
  dW <- c(exp(v[2]), 0)     # Note zero variance for lambda
  m0 <- v[3:4]
  dlmModReg(x, dV=dV, dW=dW, m0=m0)
}

varGuess <- var(diff(y), na.rm=TRUE)
mu0Guess <- as.numeric(y[1])
lambdaGuess <- mean(diff(y), na.rm=TRUE)

guesses <- c(log(varGuess), log(varGuess/5), mu0Guess, lambdaGuess)
mle <- dlmMLE(y, parm=guesses, build=buildModReg)

if (mle$convergence != 0) stop(mle$message)

model <- buildModReg(mle$par)
```

### Discussion
The first step is to 
define a function that constructs a `dlm` model object from the four parameters.
A key fact here is that we set the second component of $W$ to be zero.
That forces `dlm` to keep the second state variable, $\lambda$, constant.

```{r, eval=FALSE}
buildModReg <- function(v) {
  dV <- exp(v[1])
  dW <- c(exp(v[2]), 0)     # Note zero variance for lambda
  m0 <- v[3:4]
  dlmModReg(x, dV=dV, dW=dW, m0=m0)
}
```

The argument to the function is a 4-element vector
containing the model parameters.

* `v[1]` = Log of $\sigma_{\epsilon}^2$
* `v[2]` = Log of $\sigma_{\xi}^2$
* `v[3]` = Initial level for $\mu$
* `v[4]` = Value of $\lambda$

We need reasonable guesses for the parameters,
in order to kick-start the parameter optimization.
Here are my guesses,
but really any reasonable ones will work OK.

Parameter                Guess
-----------------------  ---------------------------------------------
$\sigma_{\epsilon}^2$    Variance of differenced $y$ data
$\sigma_{\xi}^2$         Some fraction of $\sigma_{\epsilon}^2$
$\mu_{0}$                First observed value of $y$: `y[0]`
$\lambda$                Average of the differenced $y$ data
-----------------------  ---------------------------------------------

This code calculates those guesses.

```{r, eval=FALSE}
varGuess <- var(diff(y), na.rm=TRUE)
mu0Guess <- as.numeric(y[1])
lambdaGuess <- mean(diff(y), na.rm=TRUE)
```

The `dlmMLE` function uses numerical optimzation
to find the maximum likelihood estimates (MLE) for the model parameters.
Starting with our reasonable guesses for parameters,
it will repeatedly call our `buildModReg` function,
calculate the model's likelihood, and find the MLE values.
Always check for convergence.

```{r, eval=FALSE}
guesses <- c(log(varGuess), log(varGuess/5), mu0Guess, lambdaGuess)
mle <- dlmMLE(y, parm=guesses, build=buildModReg)

if (mle$convergence != 0) stop(mle$message)
```

The function returns the MLE *parameter values*, not the MLE *model*,
so we construct the model from those parameters.

```{r, eval=FALSE}
model <- buildModReg(mle$par)
```

### Example {-}
This example uses an explanatory variable to account for a change in the level of the Nile River.
The example is taken from the excellent paper by Petris and Petrone [@PetrisPetrone2011].

The explanatory variable is quite simple.
It has value 0.0 *before* the Aswan Dam was built
and value 1.0 *after* the dam was built.
The dam had a significant effect on the river's level,
so it makes sense as an explanatory variable.

Here, the explanatory variable is called $x$.
We can construct it "manually" from our knowledge of the data:
the dam was built after the 27th observation.

```{r, eval=TRUE}
library(dlm)

y <- datasets::Nile
x <- cbind(c(rep(0,27), rep(1,length(y)-27)))

buildModReg <- function(v) {
  dV <- exp(v[1])
  dW <- c(exp(v[2]), 0)
  m0 <- v[3:4]
  dlmModReg(x, dV=dV, dW=dW, m0=m0)
}

varGuess <- var(diff(y), na.rm=TRUE)
mu0Guess <- as.numeric(y[1])
lambdaGuess <- mean(diff(y), na.rm=TRUE)

parm <- c(log(varGuess), log(varGuess/5), mu0Guess, lambdaGuess)
mle <- dlmMLE(y, parm=parm, build=buildModReg)

if (mle$convergence != 0) stop(mle$message)

model <- buildModReg(mle$par)
```

### See Also {-}

## Fitting a Regression Model, Time-Varying Coefficients {#fitRegressionVarying}

### Problem {-}
You have time series data, `y`, and a regressor variable, `x`,
that is also a time series.
You want to estimate the parameters of a regression model
with a time-varying coefficient for `x`.

### Solution {-}

```{r, eval=FALSE}
buildModReg <- function(v) {
  dV <- exp(v[1])
  dW <- exp(v[2:3])                 # Variances for mu, lambda
  m0 <- v[4:5]                      # Initial levels for mu, lambda
  dlmModReg(x, dV=dV, dW=dW, m0=m0)
}

varGuess <- var(diff(y), na.rm=TRUE)
mu0Guess <- as.numeric(y[1])
lambda0Guess <- mean(diff(y), na.rm=TRUE)

parm <- c(log(varGuess), log(varGuess/5), log(varGuess/5),
          mu0Guess, lambda0Guess)
mle <- dlmMLE(y, parm=parm, build=buildModReg)

if (mle$convergence != 0) stop(mle$message)

model <- buildModReg(mle$par)
```

Note that as of this writing,
the dlm package supports only univariate regression.

### Discussion {-}
The model-building function is similar to
the recipe for [Fitting a Regression Model with Fixed Coefficients](#fitRegressionFixed),
but does *not* force the variance of the coefficient, $\lambda$, to zero.

```{r, eval=FALSE}
buildModReg <- function(v) {
  dV <- exp(v[1])
  dW <- exp(v[2:3])                 # Variances for mu, lambda
  m0 <- v[4:5]                      # Initial levels for mu, lambda
  dlmModReg(x, dV=dV, dW=dW, m0=m0)
}
```

We need reasonable guesses for the parameters:
variances and initial levels of the state variables.

```{r, eval=FALSE}
varGuess <- var(diff(y), na.rm=TRUE)
mu0Guess <- as.numeric(y[1])
lambda0Guess <- mean(diff(y), na.rm=TRUE)
```

We call `dlmMLE` to estimate the MLE parameters through numerical optimization,
checking for convergence.

```{r, eval=FALSE}
parm <- c(log(varGuess), log(varGuess/5), log(varGuess/5),
          mu0Guess, lambda0Guess)
mle <- dlmMLE(y, parm=parm, build=buildModReg)

if (mle$convergence != 0) stop(mle$message)
```

From the MLE parameters, we construct the final model object.

```{r, eval=FALSE}
model <- buildModReg(mle$par)
```


### Example {-}
This is yet another model of the Nile River data,
using the same explanatory variable, `x`, as the
[previous recipe](#fitRegressionFixed)
but letting its regression coefficient vary over time.

```{r, eval=TRUE}
y <- datasets::Nile
x <- cbind(c(rep(0,27), rep(1,length(y)-27)))
```

We can run the solution code.

```{r, eval=TRUE}
library(dlm)

buildModReg <- function(v) {
  dV <- exp(v[1])
  dW <- exp(v[2:3])
  m0 <- v[4:5]
  dlmModReg(x, dV=dV, dW=dW, m0=m0)
}

varGuess <- var(diff(y), na.rm=TRUE)
mu0Guess <- as.numeric(y[1])
lambda0Guess <- mean(diff(y), na.rm=TRUE)

parm <- c(log(varGuess), log(varGuess/5), log(varGuess/5),
          mu0Guess, lambda0Guess)
mle <- dlmMLE(y, parm=parm, build=buildModReg)

if (mle$convergence != 0) stop(mle$message)

model <- buildModReg(mle$par)
```

The result is in `model`.

```{r, eval=TRUE}
print(model)
```

### See Also {-}


## Diagnosing a Regression Model, Fixed Coefficients {#diagnoseRegressionFixed}

### Problem {-}
You estimated the parameters of a regression model
with a fixed regressor.
Now you want diagnostic plots for the model.

### Solution {-}

### Discussion {-}

### Example {-}
This code assumes that

* `model` was fit by the [recipe](#fitRegressionFixed), above,
  for estimating a regression with fixed coefficients
* `y` is your time series data
* [QUESTION: What about `x`?]

It produces the diagnostic plots for the model.

```{r, eval=TRUE, echo=FALSE}
library(dlm)

y <- datasets::Nile
x <- cbind(c(rep(0,27), rep(1,length(y)-27)))

buildModReg <- function(v) {
  dV <- exp(v[1])
  dW <- c(exp(v[2]), 0)
  m0 <- c(v[3], v[4])
  dlmModReg(x, dV=dV, dW=dW, m0=m0) }

diffVar <- var(diff(y), na.rm=TRUE)
INIT_OBS_LOG_VAR <- log(diffVar)
INIT_TRANS_LOG_VAR <- log(diffVar / 5)
parm <- c(INIT_OBS_LOG_VAR, INIT_TRANS_LOG_VAR, y[1], 0.0)
mle <- dlmMLE(y, parm=parm, build=buildModReg)

if (mle$convergence != 0) stop(mle$message)

model <- buildModReg(mle$par)
```

```{r, eval=TRUE, echo=TRUE, fig.pos="h", fig.height=10.5}
filt <- dlmFilter(y, model)
tsdiag(filt,
       main="Diagnostics for Regression Model" )
```

### See Also {-}

## Smoothing With a Regression Model, Fixed Coefficients {#smoothRegressionFixed}

### Problem {-}
You built a regression model of your time series data, using a fixed coefficient.
Now you want to [smooth](#smoothingVersusFiltering) the data;
that is, remove the noise from the entire dataset.

### Solution {-}

### Discussion {-}

### Example {-}
This example assumes that `model` was created by the example, above,
for estimating a regression with fixed coefficients.

(*Move to footnote:* The example code also assumes that `x` and `y`
 are the predictor and the time series data, respectively, from that recipe.)

It smooths the original data based on that model.

```{r, eval=TRUE, echo=TRUE}
smooth <- dlmSmooth(y, model)

# The final, smoothed time series is this linear combination
smoothed <- smooth$s[-1,1] + x*smooth$s[-1,2]

both <- cbind(y=y, smoothed=smoothed)
```

We can plot the original data and the smoothed values.

```{r, eval=TRUE, echo=TRUE, fig.pos="h"}
plot(both, plot.type="single",
     lty=c("solid", ALT_STYLE), col=c("black", ALT_COLOR),
     main="Smoothing a Regression Model",
     ylab="Annual Flow" )
```

### See Also {-}

## Filtering With a Regression Model, Fixed Coefficients

### Problem {-}
Using your regression model,
you want to [filter](#smoothingVersusFiltering) your time series data.

### Solution {-}

### Discussion {-}

### Example {-}

### See Also {-}