Exercise 3 on page 356 of Ruppert/Matteson, and add a part (e).
Consider the AR( 1 ) model \[ Y_{t}=5-0.55 Y_{t-1}+\epsilon_{t} \] and assume that \(\sigma_{\epsilon}^{2}=1.2\)
For this problem, we will use formulas derived in lecture 8 pertaining to the AR(1) process with drift, \[ Y_{t}=c+\alpha Y_{t-1}+\epsilon_{t} \]
(a) \(\mathbf{Solution.}\qquad\) We know that this AR process is stationary because the AR coefficient \(\left|-0.55\right|<1\).
(b) \(\mathbf{Solution.}\qquad\) The mean of the process is given by, \[\begin{align*} \mathbb{E}\left[Y_{t}\right] & =\frac{c}{1-\alpha}\\ & =\frac{5}{1-\left(-0.55\right)}=\clyx(\frac{5}{1-\left(-0.55\right)}) \end{align*}\]
(c) \(\mathbf{Solution.}\qquad\) The variance of the process is given by, \[\begin{align*} \textrm{Va}r\left(Y_{t}\right) & =\frac{\sigma_{\varepsilon}^{2}}{1-\alpha^{2}}\\ & =\frac{1.2}{1-\left(-0.55\right)^{2}}=\clyx(\frac{1.2}{1-\left(-0.55\right)^{2}}) \end{align*}\]
(d) \(\mathbf{Solution.}\qquad\) The variance of the process is given by, \[\begin{align*} \gamma(h) & =\textrm{Cov}\left(Y_{t},Y_{t+h}\right)\\ & =\sigma_{\varepsilon}^{2}\frac{\alpha^{\left|h\right|}}{1-\alpha^{2}}\\ & =1.2\frac{\left(-0.55\right)^{\left|h\right|}}{1-\left(-0.55\right)^{2}} \end{align*}\]
(e) \(\mathbf{Solution.}\qquad\) For a general AR(1) process with drift, by recursively substituting \(Y_{t-k}\), we can write it as, \[\begin{align*} Y_{t} & =c+\alpha Y_{t-1}+\epsilon_{t}\\ & =c+\alpha\left(c+Y_{t-2}+\epsilon_{t-1}\right)+\epsilon_{t}\\ & \vdots\\ & =c\sum_{k=0}^{\infty}\alpha^{k}+\sum_{k=0}^{\infty}\alpha^{k}\epsilon_{t-k} \end{align*}\]
If we calculate the variance of this process, we have, \[\begin{align*} \textrm{Var}\left(Y_{t}\right) & =\textrm{Var}\left(c\sum_{k=0}^{\infty}\alpha^{k}+\sum_{k=0}^{\infty}\alpha^{k}\epsilon_{t-k}\right)\\ & =\sum_{k=0}^{\infty}\textrm{Var}\left(\alpha^{k}\epsilon_{t-k}\right)\\ & =\sigma_{\epsilon}^{2}\sum_{k=0}^{\infty}\alpha^{2k} \end{align*}\]
If \(\alpha\geq1\) then the geometric sum above will go to infinity and our process will not have finite variance, which violates the stationarity assumption. Thus, if \(\alpha=-1.2\), then the process will not be stationary.
(a) \(\mathbf{Solution.}\qquad\) In the code below we plot the ACFs using a custom function called plot_acf()
, which can be found in the Appendix. From figure 1 it looks like there is no correlation between lags for both plots. Upon closer inspection, we see that NASDAQ returns have slight autocorrelation for lags smaller than 10, while SNP returns have some autocorrelation for lags larger than 10.
# QUESTION 2A ----------------------------------------------------------------
# Read with rows reversed
ndaq = read.csv("Nasdaq_wkly_92-12.csv", header = TRUE) %>% purrr::map_df(rev)
sp500 = read.csv("SP400Mid_wkly_92-12.csv", header = TRUE) %>% purrr::map_df(rev)
rets = data.frame("NAS_ret" = c(NA, exp(diff(log(ndaq$Adj.Close))) - 1),
"SNP_ret" =c(NA, exp(diff(log(sp500$Adj.Close))) - 1),
row.names = sp500$Date)[-1,]
labels = c("NASDAQ Weekly Returns", "SNP Weekly Returns")
plot_acf(rets, labels)
(b) \(\mathbf{Solution.}\qquad\) Below we perform a Box-test for lag=10
. In this case, for Nasdaq we reject the null hypothesis that \(\gamma(h)=0\) and fail to reject for SP400.
## $NAS_ret
##
## Box-Ljung test
##
## data: newX[, i]
## X-squared = 23.372, df = 10, p-value = 0.009455
##
##
## $SNP_ret
##
## Box-Ljung test
##
## data: newX[, i]
## X-squared = 16.375, df = 10, p-value = 0.08938
However, if we perfrom a Box-test for lag=20
, then the hypothesis results are flipped. We fail to reject the null for Nasdaq returns, but reject the null for SP400 returns at 5% significance level.
## $NAS_ret
##
## Box-Ljung test
##
## data: newX[, i]
## X-squared = 27.309, df = 20, p-value = 0.1268
##
##
## $SNP_ret
##
## Box-Ljung test
##
## data: newX[, i]
## X-squared = 31.713, df = 20, p-value = 0.04645
(c) \(\mathbf{Solution.}\qquad\) From figure 2 we see that there is strong serial correlation and it is persistent over all lags. Thus returns are much more stationary than prices.
# QUESTION 2C ---------
prices = data.frame("NAS" = ndaq$Adj.Close,
"SNP" = sp500$Adj.Close,
row.names = sp500$Date)
labels = c("NASDAQ Weekly Prices", "SNP Weekly Prices")
plot_acf(prices, labels)
(a) \(\mathbf{Solution.}\qquad\) First we see that, \[ X=\begin{cases} 0 & \text{with prob. }\frac{1}{4}\\ 1 & \text{with prob. }\frac{1}{2}\\ 2 & \text{with prob. }\frac{1}{4} \end{cases} \]
Next we can compute \(E(Y\mid X=x)\) for each value of \(x\). \[\begin{align*} \mathbb{E}[Y\mid X=0] & =0\cdot\mathbb{P}(Y=0\mid X=0)=0\\ \\ \mathbb{E}[Y\mid X=1] & =1\cdot\mathbb{P}(Y=1\mid X=1)+(-1)\cdot\mathbb{P}(Y=-1\mid X=1)\\ & =\frac{1/4}{1/2}(1)+\frac{1/4}{1/2}(-1)\\ & =0\\ \\ \mathbb{E}[Y\mid X=2] & =-2\cdot P(Y=-2\mid X=2)=-2 \end{align*}\]
Thus, \[ \hat{Y}=\mathbb{E}[Y\mid X=x]=\begin{cases} 0 & \text{ if }x=0\\ 0 & \text{ if }x=1\\ -2 & \text{ if }x=2 \end{cases} \]
Finally the MSPE is calculated by, \[\begin{align*} \text{ MSPE } & =\mathbb{E}\left[(\hat{Y}-Y)^{2}\right]\\ & =\frac{1}{4}\left[(-2-(-2))^{2}+(0-1)^{2}+(0-(-1))^{2}+(0-0)^{2}\right]\\ & =\frac{1}{4}(2)=\frac{1}{2} \end{align*}\]
(b) \(\mathbf{Solution.}\qquad\) The formula for the best linear linear predictor of \(Y\) is given by, \[ \hat{Y}=\mu_{Y}+\Sigma_{YX}\Sigma_{X}^{-1}\left(X-\mu_{X}\right) \]
Thus we compute each parameter, \[\begin{align*} \mu_{Y} & =\frac{1}{4}(-2-1+0+1)=\clyx(\frac{1}{4}(-2-1+0+1))\\ \\ \mu_{X} & =0\cdot\frac{1}{4}+1\cdot\frac{1}{2}+2\cdot\frac{1}{4}=\clyx(0\cdot\frac{1}{4}+1\cdot\frac{1}{2}+2\cdot\frac{1}{4})\\ \\ \sigma_{Y}^{2} & =\mathbb{E}\left[Y^{2}\right]-\mathbb{E}[Y]^{2}\\ & =\frac{1}{4}\left((-2)^{2}+(-1)^{2}+0^{2}+1^{2}\right)-\left(-\frac{1}{2}\right)^{2}\\ & =\clyx(\frac{1}{4}\left((-2)^{2}+(-1)^{2}+0^{2}+1^{2}\right)-\left(-\frac{1}{2}\right)^{2})\\ \\ \sigma_{X}^{2} & =\mathbb{E}\left[X^{2}\right]-\mathbb{E}[X]^{2}\\ & =\left(\frac{1}{4}\left(0^{2}\right)+\frac{1}{2}\left(1^{2}\right)+\frac{1}{4}\left(2^{2}\right)\right)-1^{2}\\ & =\clyx(\left(\frac{1}{4}\left(0^{2}\right)+\frac{1}{2}\left(1^{2}\right)+\frac{1}{4}\left(2^{2}\right)\right)-1^{2})\\ \\ \Sigma_{XY} & =\textrm{Cov}(X,Y)\\ & =\mathbb{E}[XY]-\mathbb{E}[X]\mathbb{E}[Y]\\ & =\frac{1}{4}((-2)(2)+(-1)(1)+0+(1)(1))-(1)\left(-\frac{1}{2}\right)\\ & =\clyx(\frac{1}{4}((-2)(2)+(-1)(1)+0+(1)(1))-(1)\left(-\frac{1}{2}\right)) \end{align*}\]
Therefore the best linear linear predictor is, \[\begin{align*} \hat{Y} & =-\frac{1}{2}+\left(-\frac{1}{2}\right)\left(\frac{1}{2}\right)^{-1}(X-1)\\ & =\frac{1}{2}-X \end{align*}\]
Finally the MSPE is given by, \[\begin{align*} \textrm{MSPE} & =\sigma_{Y}^{2}-\Sigma_{YX}\Sigma_{X}^{-1}\Sigma_{XY}\\ & =\frac{5}{4}-\left(-\frac{1}{2}\right)^{2}\left(\frac{2}{1}\right)=\clyx(\frac{5}{4}-\left(-\frac{1}{2}\right)^{2}\left(\frac{2}{1}\right)) \end{align*}\]
(c) \(\mathbf{Solution.}\qquad\) We notice that MSPE from part (a) is lower than MSPE from part (b). This is due to the fact that the predictor from part (a) predicts values -2 and 0 perfectly and the error arises from failing to predict 1 or -1. For the predictor from part (b) there is always some error for the prediction of each value of \(Y\).
plot_acf
function# Compute Autocorrelation function for the returns using ggplot2
plot_acf = function(data, titles) {
plot_list = list()
iter = 1
# Create plot for each return
for (col in colnames(data)) {
plot_list[[iter]] = local({
bacf = acf(data[, col], lag = 20, plot = FALSE)
bacfdf = with(bacf, data.frame(lag, acf))
p1 = ggplot(data = bacfdf, mapping = aes(x = lag, y = acf)) +
geom_hline(aes(yintercept = 0), colour = "steelblue") +
geom_segment(mapping = aes(xend = lag, yend = 0), colour = "steelblue") +
geom_abline(slope = 0, intercept = 1.96 / sqrt(length(data[, col])),
linetype = "dashed", colour = "darkred") +
geom_abline(slope = 0, intercept = -1.96 / sqrt(length(data[, col])),
linetype = "dashed", colour = "darkred") +
ggtitle(paste0("ACF plot of ", titles[iter])) +
theme(plot.title = element_text(hjust = 0.5))
})
iter = iter + 1
}
# plot all subplots
grid.arrange(grobs = plot_list, ncol = 1)
}