STATS 509: Homework 8

Question 1

Exercise 6 on page 357 in Ruppert/Matteson
1. Let \(Y_{t}\) be an \(\mathrm{MA}(2)\) process \[ Y_{t}=\mu+\epsilon_{t}+\theta_{1} \epsilon_{t-1}+\theta_{2} \epsilon_{t-2} \] Find formulas for the autocovariance and autocorrelation functions of \(Y_{t}\)
2. Suppose \[\quad Z_{t}=Y_{t}+Y_{t-1}\] Determine what type of process this is (from ARMA class), and derive its autocovariance function.
Exercise 7 on page 357 in Ruppert/Matteson. Hint: For part (c), can use R-function, “solve”. Let \(Y_{t}\) be a stationary \(\mathrm{AR}(2)\) process \[ \left(Y_{t}-\mu\right)=\phi_{1}\left(Y_{t-1}-\mu\right)+\phi_{2}\left(Y_{t-2}-\mu\right)+\epsilon_{t} \]
1. Show that the ACF of \(Y_{t}\) satisfies the equation \[ \rho(k)=\phi_{1} \rho(k-1)+\phi_{2} \rho(k-2) \] for all values of \(k>0 .\) (These are a special case of the Yule-Walker equations. \()\) \[ \left[\text {Hint: } \gamma(k)=\operatorname{Cov}\left(Y_{t}, Y_{t-k}\right)=\operatorname{Cov}\left\{\phi_{1}\left(Y_{t-1}-\mu\right)+\phi_{2}\left(Y_{t-2}-\mu\right)+\right.\right. \] \(\left.\left.\epsilon_{t}, Y_{t-k}\right\} \text { and } \epsilon_{t} \text { and } Y_{t-k} \text { are independent if } k>0 .\right]\)
2. Use part (a) to show that \(\left(\phi_{1}, \phi_{2}\right)\) solves the following system of equations: \[ \left(\begin{array}{c} \rho(1) \\ \rho(2) \end{array}\right)=\left(\begin{array}{cc} 1 & \rho(1) \\ \rho(1) & 1 \end{array}\right)\left(\begin{array}{l} \phi_{1} \\ \phi_{2} \end{array}\right) \]
3. Suppose that \(\rho(1)=0.4\) and \(\rho(2)=0.2 .\) Find \(\phi_{1}, \phi_{2},\) and \(\rho(3)\)

(a)(i) \(\mathbf{Solution.}\qquad\) First we compute a couple covariances, \[\begin{align} \textrm{Cov}\left(Y_{n},Y_{n}\right) & =\sigma^{2}+\theta_{1}^{2}\sigma^{2}+\sigma^{2}\theta_{2}^{2}\\ \nonumber \\ \textrm{Cov}\left(Y_{n},Y_{n+1}\right) & =\textrm{Cov}\left(\varepsilon_{n}+\theta_{1}\varepsilon_{n-1}+\theta_{2}\varepsilon_{n-2},\varepsilon_{n+1}+\theta_{1}\varepsilon_{n}+\theta_{2}\varepsilon_{n-1}\right)\\ & =\theta_{1}\sigma^{2}+\theta_{1}\theta_{2}\sigma^{2}\nonumber \\ \nonumber \\ \textrm{Cov}\left(Y_{n},Y_{n+2}\right) & =\textrm{Cov}\left(\varepsilon_{n}+\theta_{1}\varepsilon_{n-1}+\theta_{2}\varepsilon_{n-2},\varepsilon_{n+2}+\theta_{1}\varepsilon_{n+1}+\theta_{2}\varepsilon_{n}\right)\\ & =\theta_{2}\sigma^{2}\nonumber \end{align}\]

Let \(\theta_{0}=1\). Then the autocovariance function is, \[ \gamma(h)=\Sigma_{Y}\left(n,n+h\right)=\begin{cases} \sigma^{2}\sum_{i=0}^{2-h}\theta_{i}\theta_{i+h} & \left|h\right|\leq2\\ 0 & \left|h\right|>2 \end{cases} \]

Thus the autocorrelation function becomes, \[ \rho_{Y}(h)=\frac{\gamma(h)}{\gamma(0)}=\begin{cases} \frac{\sum_{i=0}^{2-h}\theta_{i}\theta_{i+h}}{\left(1+\theta_{1}^{2}+\theta_{2}^{2}\right)} & \text{ if }\left|h\right|\leq2\\ 0 & \text{ if }\left|h\right|>2 \end{cases} \]

(a)(ii) \(\mathbf{Solution.}\qquad\) We can expand the process \(Z_{t}\), \[\begin{align*} Z_{t} & =Y_{t}+Y_{t-1}\\ & =\left(\mu+\varepsilon_{t}+\theta_{1}\varepsilon_{t-1}+\theta_{2}\varepsilon_{t-2}\right)+\left(\mu+\varepsilon_{t-1}+\theta_{1}\varepsilon_{t-2}+\theta_{2}\varepsilon_{t-3}\right)\\ & =2\mu+\varepsilon_{t}+\left(1+\theta_{1}\right)\varepsilon_{t-1}+\left(\theta_{1}+\theta_{2}\right)\varepsilon_{t-2}+\theta_{2}\varepsilon_{t-3} \end{align*}\]

Then process above looks like an \(\textrm{MA}(3)\) process with the following coefficients, \[ \phi_{0}=1,\ \phi_{1}=\left(1+\theta_{1}\right),\ \phi_{2}=\left(\theta_{1}+\theta_{2}\right)\text{ and }\phi_{3}=\theta_{2} \]

Then applying results from part i) the autocovariance function is, \[ \gamma\left(h\right)=\Sigma_{Z}\left(n,n+h\right)=\begin{cases} \sigma^{2}\sum_{i=0}^{3-h}\phi_{i}\phi_{i+h} & \text{ if }\left|h\right|\leq3\\ 0 & \text{ if }\left|h\right|>3 \end{cases} \]

The autocorrelation function is, \[ \rho_{Z}(h)=\frac{\gamma(h)}{\gamma(0)}=\begin{cases} \frac{\sum_{i=0}^{3-h}\phi_{i}\phi_{i+h}}{\sum_{i=0}^{3}\phi_{i}^{2}} & \text{ if }\left|h\right|\leq3\\ 0 & \text{ if }\left|h\right|>3 \end{cases} \]

(b)(a) \(\mathbf{Solution.}\qquad\) We can write the autocovariance function \[\begin{align*} \gamma(k) & =\Sigma_{Y}(t,t+k)\\ & =\textrm{Cov}\left(Y_{t},Y_{t+k}\right)\\ & =\textrm{Cov}\left(Y_{t},\phi_{1}\left(Y_{t+k-1}-\mu\right)+\phi_{2}\left(Y_{t+k-2}-\mu\right)+\varepsilon_{t+k}\right)\\ & =\phi_{1}\textrm{Cov}\left(Y_{t},Y_{t+k-1}\right)+\phi_{2}\textrm{Cov}\left(Y_{t},Y_{t+k-2}\right)+\underbrace{\textrm{Cov}\left(Y_{t},\varepsilon_{t+k}\right)}_{=0}\\ & =\phi_{1}\gamma(k-1)+\phi_{2}\gamma(k-2) \end{align*}\]

So the autocorrelation function is, \[\begin{align*} \rho(k) & =\frac{\gamma(k)}{\gamma(0)}\\ & =\frac{\phi_{1}\gamma(k-1)+\phi_{2}\gamma(k-2)}{\gamma(0)}\\ & =\phi_{1}\rho(k-1)+\phi_{2}\rho(k-2) \end{align*}\]

(b)(b) \(\mathbf{Solution.}\qquad\) We know from part a) that, \[\begin{align} \rho(1) & =\phi_{1}\rho(1-1)+\phi_{2}\rho(1-2)\\ & =\phi_{1}+\phi_{2}\rho(1)\nonumber \\ \nonumber \\ \rho(2) & =\phi_{1}\rho(2-1)+\phi_{2}\rho(2-2)\\ & =\phi_{1}\rho(1)+\phi_{2}\nonumber \end{align}\]

The two equations above correspond to the system of equations.

(b)(c) \(\mathbf{Solution.}\qquad\) Since \(\rho(k)=\phi_{1}\rho(k-1)+\phi_{2}\rho(k-2)\), Then \(\rho(3)\) is, \[\begin{align*} \rho(3) & =\phi_{1}\rho(3-1)+\phi_{2}\rho(3-2)\\ & =\phi_{1}\rho(2)+\phi_{2}\rho(1) \end{align*}\]

With the R code below, we solve for \(\phi_1, \phi_2\) and \(\rho(3)\),

rho = matrix(c(0.4, 0.2), ncol = 1)
A = matrix(c(1, 0.4, 0.4, 1), ncol = 2)
phi = solve(A) %*% rho
phi

##            [,1]
## [1,] 0.38095238
## [2,] 0.04761905

# Use phi to solve for rho(3)
rho_3 = t(phi) %*% c(0.2, 0.4)
rho_3

##           [,1]
## [1,] 0.0952381

Question 2

In CTools is data set of daily price data corresponding the Apple Stock from Jan \(1\) \(2000\) through Dec \(31\) \(2014\) - it is in file “apple_stock_day_2000_2014.csv”.

Derive the log-returns for the Apple stock from Day \(150\) (out of \(252\) ) in \(2002\) to the end of \(2003 .\) Generate plot of the sample auto-correlation of these Apple daily log-returns, and discuss what is observed from these plots. Also, carry out a Box-Ljung test on whether the auto-correlations are all zero out to lag \(10,\) and summarize your results.
Utilize auto.arima and do a best fit ARIMA model for the log-returns from Day \(150\) (out of \(252\) ) in \(2002\) to the end of \(2003.\) Use auto.arima for initial model selection and do the usual time series diagnostics to assess the goodness of the model. Also, provide a discussion on the final model/parameter selection.
Based on your work in (b), show a plot of the \(10\) -step ahead prediction (into \(2004\) ) on the Adjusted Closing Price of Apple. On this plot also plot out the \(95 \%\) prediction intervals along with the predicted adjusted closing prices.
Derive the VaR for \(\alpha=.005\) for \(1\) day ahead relative to a \(1\) million dollar portfolio, based on your analysis in (b) and assuming a portfolio of \(1\) million dollars. Utilize the normal distribution on the white noise for this derivation, and comment on your thoughts on the accuracy of utilizing such a model based on diagnostics from part (b).

Hint: May find the following R-commands helpful.

X = read.csv("apple_stock_day_2000_2014.csv",header=TRUE)
Appl_day <-X$Adj.Close
ApplLR <- diff(log(Appl_day)) # generating log returns (daily)
ApplLR.ts <- ts(ApplLR,start=c(2000,1),frequency=252,names=c("Apple_Return"))
ApplLR_2002_2003_sub.ts = window(ApplLR.ts, start=c(2002, 150), end=c(2003, 252))

(a) \(\mathbf{Solution.}\qquad\) First, in figure 1 we plot the daily log-returns for Apple stock from 2002(Day 150) - 2003. We see that overall the time series looks like it may be modeled well by a stationary model. We will generate ACF plot to check this assumption.

# QUESTION 2A ----------------------------------------------------------------
lrets = window(ApplLR.ts, start=c(2002, 150), end=c(2003, 252))

# Plot of log-returns
autoplot(lrets) + 
  ggtitle("AAPL Stock Log-Returns") +
  xlab("Date") +
  ylab("log-returns")

Figure 1: AAPL Stock Log-Returns from 2002(Day 150) - 2003.

In figure 2 we see that there is very little serial correlation between observations.

# ACF of log returns
autoplot(acf(lrets, plot = FALSE)) + 
  ggtitle("ACF of AAPL log-returns")

Figure 2: ACF of AAPL log-returns.

Finally, performing Box-Ljung test on whether the autocorrelations are 0, we see that the p-value is not significant enough to reject this assumption, which agrees with the previous two figures.

# p-value is not significant
Box.test(lrets, lag = 10, type = 'Ljung')

## 
##  Box-Ljung test
## 
## data:  lrets
## X-squared = 9.3237, df = 10, p-value = 0.5017

(b) \(\mathbf{Solution.}\qquad\) Using auto.arima() function, we find that it chooses ARIMA(0,0,0) with no drift as the best model according to AIC. By checking the diagnostic plots in figure 3, we see from the ACF of residuals that they are close to white noise and the plot showing p-values for Ljung-Box statistic confirm this since none are significant.

# QUESTION 2B ----------
# Best fit is a white noise process ARIMA(0,0,0)
library(forecast)

fit = auto.arima(lrets, max.p=5, max.q=5, ic="aic")
fit

## Series: lrets 
## ARIMA(0,0,0) with zero mean 
## 
## sigma^2 estimated as 0.0005436:  log likelihood=830.59
## AIC=-1659.18   AICc=-1659.17   BIC=-1655.31

ggtsdiag(fit)

Figure 3: Diagnostic plots for ARIMA(0,0,0).

In figure 4 we check the normality assumption of the residuals. We see that the residuals appear to be wider-tailed than normal, thus we conclude that the residuals are not normally distributed.

# Also show qq-plot of residuals. Residuals do not look normal
res = data.frame(residuals = fit$residuals)
ggplot(res, aes(sample = residuals)) +
  stat_qq(colour = "darkred", shape = 1) +
  stat_qq_line(colour = "steelblue") +
  ggtitle("Normal Plot of ARIMA(0,0,0) Residuals") +
  theme(plot.title = element_text(hjust = 0.5)) +
  xlab('Theoretical Quantiles') +
  ylab('Sample Quantiles')

Figure 4: Normal Plot of ARIMA(0,0,0) Residuals.

(c) \(\mathbf{Solution.}\qquad\) The code below generates 10-day forecast using our fitted ARIMA(0,0,0) model from part b). From figure 5. The predicted model implies that predicted Adj. Close price will be the same for each day in our 10-day forecast. We also show 95% intervals of the close price.

# QUESTION 2C ----------
days_ahead = 10

# Most of the code below involves setting up a dataframe for plotting the 
# predictions 
X$Date = as.Date(X$Date,format="%m/%d/%Y")

# Filter Data from day 150 of 2002 to end of 2003
time_ind = year(X$Date) >= 2002 & year(X$Date) <= 2003
df1 = X[time_ind,]
df1 = df1[150:nrow(df1),]

# Get predictions
fit = arima(log(df1$Adj.Close), order = c(0, 1, 0))

forecasts = predict(fit, days_ahead)
forecasts

## $pred
## Time Series:
## Start = 356 
## End = 365 
## Frequency = 1 
##  [1] 0.3646431 0.3646431 0.3646431 0.3646431 0.3646431 0.3646431 0.3646431
##  [8] 0.3646431 0.3646431 0.3646431
## 
## $se
## Time Series:
## Start = 356 
## End = 365 
## Frequency = 1 
##  [1] 0.02322630 0.03284695 0.04022913 0.04645260 0.05193559 0.05689259
##  [7] 0.06145102 0.06569390 0.06967891 0.07344802

# Build temporary dataframe to append to previous dataframe
first_n_days = X[year(X$Date) == 2004,] %>% 
  top_n(-days_ahead, wt = Date) %>%
  select(Date)

df2 = data.frame(matrix(NA, nrow = days_ahead, ncol = length(X)))
colnames(df2) = colnames(X)
df2["Date"] = first_n_days

df = rbind(df1, df2)

# Calculate the n-day forecasts for Adj Close Price
last_price = df1[nrow(df1), "Adj.Close"]
lower = exp(forecasts$pred - 1.96*forecasts$se)
upper = exp(forecasts$pred + 1.96*forecasts$se)
predicted = exp(forecasts$pred)

# Add forecasts to our df
df$predicted = c(rep(NA, nrow(df1)), predicted)
df$lower = c(rep(NA, nrow(df1)), lower)
df$upper = c(rep(NA, nrow(df1)), upper)

# Plot n-step prediction
ggplot(df, aes(x=Date)) +
  geom_line(aes(y=Adj.Close)) + 
  geom_line(aes(y=predicted), colour = "darkred") +
  geom_line(aes(y=upper), colour = "steelblue") +
  geom_line(aes(y=lower), colour = "steelblue") + 
  ggtitle("10-step ahead prediction AAPL Adj Close Price")

Figure 5: 10-step ahead prediction AAPL Adj Close Price.

(d) \(\mathbf{Solution.}\qquad\) Below we compute 1-day ahead \(\textrm{VaR}_{0.005}\). This estimate is likely an underestimate, because as we saw from the qq-plot in figure 4, the residuals were heavy-tailed on both tails and not normally distributed.

# QUESTION 2D ----------

# We can use our lower estimate of the price from part c) to compute return
alpha = 0.005
pred_low_price = exp(forecasts$pred[1] + qnorm(alpha)*forecasts$se[1])

VaR = -1e6 * (pred_low_price / last_price - 1)
VaR

## [1] 58072.52

STATS 509: Homework 8

Israel Diego

4/3/2020

Question 1

Question 2