STATS 509: Homework 7

Question 1

Exercise 3 on page 356 of Ruppert/Matteson, and add a part (e).

Consider the AR( 1 ) model \[ Y_{t}=5-0.55 Y_{t-1}+\epsilon_{t} \] and assume that \(\sigma_{\epsilon}^{2}=1.2\)

Is this process stationary? Why or why not?
What is the mean of this process?
What is the variance of this process?
What is the covariance function of this process?
Show that the process in this problem cannot be stationary if the coefficient of -.55 in front \(Y_{t-1}\) is changed to -1.2

For this problem, we will use formulas derived in lecture 8 pertaining to the AR(1) process with drift, \[ Y_{t}=c+\alpha Y_{t-1}+\epsilon_{t} \]

(a) \(\mathbf{Solution.}\qquad\) We know that this AR process is stationary because the AR coefficient \(\left|-0.55\right|<1\).

(b) \(\mathbf{Solution.}\qquad\) The mean of the process is given by, \[\begin{align*} \mathbb{E}\left[Y_{t}\right] & =\frac{c}{1-\alpha}\\ & =\frac{5}{1-\left(-0.55\right)}=\clyx(\frac{5}{1-\left(-0.55\right)}) \end{align*}\]

(c) \(\mathbf{Solution.}\qquad\) The variance of the process is given by, \[\begin{align*} \textrm{Va}r\left(Y_{t}\right) & =\frac{\sigma_{\varepsilon}^{2}}{1-\alpha^{2}}\\ & =\frac{1.2}{1-\left(-0.55\right)^{2}}=\clyx(\frac{1.2}{1-\left(-0.55\right)^{2}}) \end{align*}\]

(d) \(\mathbf{Solution.}\qquad\) The variance of the process is given by, \[\begin{align*} \gamma(h) & =\textrm{Cov}\left(Y_{t},Y_{t+h}\right)\\ & =\sigma_{\varepsilon}^{2}\frac{\alpha^{\left|h\right|}}{1-\alpha^{2}}\\ & =1.2\frac{\left(-0.55\right)^{\left|h\right|}}{1-\left(-0.55\right)^{2}} \end{align*}\]

(e) \(\mathbf{Solution.}\qquad\) For a general AR(1) process with drift, by recursively substituting \(Y_{t-k}\), we can write it as, \[\begin{align*} Y_{t} & =c+\alpha Y_{t-1}+\epsilon_{t}\\ & =c+\alpha\left(c+Y_{t-2}+\epsilon_{t-1}\right)+\epsilon_{t}\\ & \vdots\\ & =c\sum_{k=0}^{\infty}\alpha^{k}+\sum_{k=0}^{\infty}\alpha^{k}\epsilon_{t-k} \end{align*}\]

If we calculate the variance of this process, we have, \[\begin{align*} \textrm{Var}\left(Y_{t}\right) & =\textrm{Var}\left(c\sum_{k=0}^{\infty}\alpha^{k}+\sum_{k=0}^{\infty}\alpha^{k}\epsilon_{t-k}\right)\\ & =\sum_{k=0}^{\infty}\textrm{Var}\left(\alpha^{k}\epsilon_{t-k}\right)\\ & =\sigma_{\epsilon}^{2}\sum_{k=0}^{\infty}\alpha^{2k} \end{align*}\]

If \(\alpha\geq1\) then the geometric sum above will go to infinity and our process will not have finite variance, which violates the stationarity assumption. Thus, if \(\alpha=-1.2\), then the process will not be stationary.

Question 2

In the Data directory are Nasdaq weekly return data and SP400 weekly return data from 1992 to 2012

Show plots of sample auto-correlation function of weekly return turn data for both Nasdaq and SP400, and provide a discussion of the results.
Carry out Box-test for whiteness of weekly returns - again do for both Nasdaq and SP400, and provide a discussion of your final results.
Show plots of sample auto-correlation functions of the actual price data for both Nasdaq and SP400 - these are integrated versions of the weekly returns. Provide a discussion on these results and what it indicates.

(a) \(\mathbf{Solution.}\qquad\) In the code below we plot the ACFs using a custom function called plot_acf(), which can be found in the Appendix. From figure 1 it looks like there is no correlation between lags for both plots. Upon closer inspection, we see that NASDAQ returns have slight autocorrelation for lags smaller than 10, while SNP returns have some autocorrelation for lags larger than 10.

# QUESTION 2A ----------------------------------------------------------------
# Read with rows reversed
ndaq = read.csv("Nasdaq_wkly_92-12.csv", header = TRUE) %>% purrr::map_df(rev)
sp500 = read.csv("SP400Mid_wkly_92-12.csv", header = TRUE) %>% purrr::map_df(rev)
rets = data.frame("NAS_ret" = c(NA, exp(diff(log(ndaq$Adj.Close))) - 1),
                   "SNP_ret" =c(NA, exp(diff(log(sp500$Adj.Close))) - 1),
                   row.names = sp500$Date)[-1,]

labels = c("NASDAQ Weekly Returns", "SNP Weekly Returns")
plot_acf(rets, labels)

Figure 1: ACF Plots of Returns.

(b) \(\mathbf{Solution.}\qquad\) Below we perform a Box-test for lag=10. In this case, for Nasdaq we reject the null hypothesis that \(\gamma(h)=0\) and fail to reject for SP400.

# QUESTION 2B ---------
apply(rets, 2, Box.test, lag = 10, type = 'Ljung')

## $NAS_ret
## 
##  Box-Ljung test
## 
## data:  newX[, i]
## X-squared = 23.372, df = 10, p-value = 0.009455
## 
## 
## $SNP_ret
## 
##  Box-Ljung test
## 
## data:  newX[, i]
## X-squared = 16.375, df = 10, p-value = 0.08938

However, if we perfrom a Box-test for lag=20, then the hypothesis results are flipped. We fail to reject the null for Nasdaq returns, but reject the null for SP400 returns at 5% significance level.

apply(rets, 2, Box.test, lag = 20, type = 'Ljung')

## $NAS_ret
## 
##  Box-Ljung test
## 
## data:  newX[, i]
## X-squared = 27.309, df = 20, p-value = 0.1268
## 
## 
## $SNP_ret
## 
##  Box-Ljung test
## 
## data:  newX[, i]
## X-squared = 31.713, df = 20, p-value = 0.04645

(c) \(\mathbf{Solution.}\qquad\) From figure 2 we see that there is strong serial correlation and it is persistent over all lags. Thus returns are much more stationary than prices.

# QUESTION 2C ---------
prices = data.frame("NAS" = ndaq$Adj.Close,
                    "SNP" = sp500$Adj.Close,
                    row.names = sp500$Date)

labels = c("NASDAQ Weekly Prices", "SNP Weekly Prices")
plot_acf(prices, labels)

Figure 2: ACF Plots of Returns.

Question 3

Suppose \(Y\) is random variable taking values of -2,-1,0,1 each with probability of \(\frac{1}{4}\) and \(X\) is rv defined by \(X=|Y|\)

Derive the predictor of \(Y\) based on \(X,\) using conditional expectation, i.e., \(\hat{Y}(x)=\) \(E(Y | X=x)\) and quantify its MSPE.
Derive the best linear predictor of \(Y\) based on \(X\), and quantify its MSPE.
Compare answers from (a) and (b), and provide a discussion on the differences, i.e., why is MSPE in (a) smaller than the MSPE in (b).

(a) \(\mathbf{Solution.}\qquad\) First we see that, \[ X=\begin{cases} 0 & \text{with prob. }\frac{1}{4}\\ 1 & \text{with prob. }\frac{1}{2}\\ 2 & \text{with prob. }\frac{1}{4} \end{cases} \]

Next we can compute \(E(Y\mid X=x)\) for each value of \(x\). \[\begin{align*} \mathbb{E}[Y\mid X=0] & =0\cdot\mathbb{P}(Y=0\mid X=0)=0\\ \\ \mathbb{E}[Y\mid X=1] & =1\cdot\mathbb{P}(Y=1\mid X=1)+(-1)\cdot\mathbb{P}(Y=-1\mid X=1)\\ & =\frac{1/4}{1/2}(1)+\frac{1/4}{1/2}(-1)\\ & =0\\ \\ \mathbb{E}[Y\mid X=2] & =-2\cdot P(Y=-2\mid X=2)=-2 \end{align*}\]

Thus, \[ \hat{Y}=\mathbb{E}[Y\mid X=x]=\begin{cases} 0 & \text{ if }x=0\\ 0 & \text{ if }x=1\\ -2 & \text{ if }x=2 \end{cases} \]

Finally the MSPE is calculated by, \[\begin{align*} \text{ MSPE } & =\mathbb{E}\left[(\hat{Y}-Y)^{2}\right]\\ & =\frac{1}{4}\left[(-2-(-2))^{2}+(0-1)^{2}+(0-(-1))^{2}+(0-0)^{2}\right]\\ & =\frac{1}{4}(2)=\frac{1}{2} \end{align*}\]

(b) \(\mathbf{Solution.}\qquad\) The formula for the best linear linear predictor of \(Y\) is given by, \[ \hat{Y}=\mu_{Y}+\Sigma_{YX}\Sigma_{X}^{-1}\left(X-\mu_{X}\right) \]

Thus we compute each parameter, \[\begin{align*} \mu_{Y} & =\frac{1}{4}(-2-1+0+1)=\clyx(\frac{1}{4}(-2-1+0+1))\\ \\ \mu_{X} & =0\cdot\frac{1}{4}+1\cdot\frac{1}{2}+2\cdot\frac{1}{4}=\clyx(0\cdot\frac{1}{4}+1\cdot\frac{1}{2}+2\cdot\frac{1}{4})\\ \\ \sigma_{Y}^{2} & =\mathbb{E}\left[Y^{2}\right]-\mathbb{E}[Y]^{2}\\ & =\frac{1}{4}\left((-2)^{2}+(-1)^{2}+0^{2}+1^{2}\right)-\left(-\frac{1}{2}\right)^{2}\\ & =\clyx(\frac{1}{4}\left((-2)^{2}+(-1)^{2}+0^{2}+1^{2}\right)-\left(-\frac{1}{2}\right)^{2})\\ \\ \sigma_{X}^{2} & =\mathbb{E}\left[X^{2}\right]-\mathbb{E}[X]^{2}\\ & =\left(\frac{1}{4}\left(0^{2}\right)+\frac{1}{2}\left(1^{2}\right)+\frac{1}{4}\left(2^{2}\right)\right)-1^{2}\\ & =\clyx(\left(\frac{1}{4}\left(0^{2}\right)+\frac{1}{2}\left(1^{2}\right)+\frac{1}{4}\left(2^{2}\right)\right)-1^{2})\\ \\ \Sigma_{XY} & =\textrm{Cov}(X,Y)\\ & =\mathbb{E}[XY]-\mathbb{E}[X]\mathbb{E}[Y]\\ & =\frac{1}{4}((-2)(2)+(-1)(1)+0+(1)(1))-(1)\left(-\frac{1}{2}\right)\\ & =\clyx(\frac{1}{4}((-2)(2)+(-1)(1)+0+(1)(1))-(1)\left(-\frac{1}{2}\right)) \end{align*}\]

Therefore the best linear linear predictor is, \[\begin{align*} \hat{Y} & =-\frac{1}{2}+\left(-\frac{1}{2}\right)\left(\frac{1}{2}\right)^{-1}(X-1)\\ & =\frac{1}{2}-X \end{align*}\]

Finally the MSPE is given by, \[\begin{align*} \textrm{MSPE} & =\sigma_{Y}^{2}-\Sigma_{YX}\Sigma_{X}^{-1}\Sigma_{XY}\\ & =\frac{5}{4}-\left(-\frac{1}{2}\right)^{2}\left(\frac{2}{1}\right)=\clyx(\frac{5}{4}-\left(-\frac{1}{2}\right)^{2}\left(\frac{2}{1}\right)) \end{align*}\]

(c) \(\mathbf{Solution.}\qquad\) We notice that MSPE from part (a) is lower than MSPE from part (b). This is due to the fact that the predictor from part (a) predicts values -2 and 0 perfectly and the error arises from failing to predict 1 or -1. For the predictor from part (b) there is always some error for the prediction of each value of \(Y\).

Appendix

ggplot `plot_acf` function

# Compute Autocorrelation function for the returns using ggplot2
plot_acf = function(data, titles) {
  plot_list = list()
  iter = 1
  
  # Create plot for each return
  for (col in colnames(data)) {
    plot_list[[iter]] = local({
      bacf = acf(data[, col], lag = 20, plot = FALSE)
      bacfdf = with(bacf, data.frame(lag, acf))
      
      p1 = ggplot(data = bacfdf, mapping = aes(x = lag, y = acf)) +
        geom_hline(aes(yintercept = 0), colour = "steelblue") +
        geom_segment(mapping = aes(xend = lag, yend = 0), colour = "steelblue") + 
        geom_abline(slope = 0, intercept = 1.96 / sqrt(length(data[, col])), 
                    linetype = "dashed", colour = "darkred") + 
        geom_abline(slope = 0, intercept = -1.96 / sqrt(length(data[, col])), 
                    linetype = "dashed", colour = "darkred") + 
        ggtitle(paste0("ACF plot of ", titles[iter])) + 
        theme(plot.title = element_text(hjust = 0.5))
    })
    iter = iter + 1
  }
  
  # plot all subplots
  grid.arrange(grobs = plot_list, ncol = 1)
}

STATS 509: Homework 7

Israel Diego

3/27/2020

Question 1

Question 2

Question 3

Appendix

ggplot `plot_acf` function

STATS 509: Homework 7

Israel Diego

3/27/2020

Question 1

Question 2

Question 3

Appendix

ggplot plot_acf function

ggplot `plot_acf` function