(a)(i) \(\mathbf{Solution.}\qquad\) First we compute a couple covariances, \[\begin{align} \textrm{Cov}\left(Y_{n},Y_{n}\right) & =\sigma^{2}+\theta_{1}^{2}\sigma^{2}+\sigma^{2}\theta_{2}^{2}\\ \nonumber \\ \textrm{Cov}\left(Y_{n},Y_{n+1}\right) & =\textrm{Cov}\left(\varepsilon_{n}+\theta_{1}\varepsilon_{n-1}+\theta_{2}\varepsilon_{n-2},\varepsilon_{n+1}+\theta_{1}\varepsilon_{n}+\theta_{2}\varepsilon_{n-1}\right)\\ & =\theta_{1}\sigma^{2}+\theta_{1}\theta_{2}\sigma^{2}\nonumber \\ \nonumber \\ \textrm{Cov}\left(Y_{n},Y_{n+2}\right) & =\textrm{Cov}\left(\varepsilon_{n}+\theta_{1}\varepsilon_{n-1}+\theta_{2}\varepsilon_{n-2},\varepsilon_{n+2}+\theta_{1}\varepsilon_{n+1}+\theta_{2}\varepsilon_{n}\right)\\ & =\theta_{2}\sigma^{2}\nonumber \end{align}\]
Let \(\theta_{0}=1\). Then the autocovariance function is, \[ \gamma(h)=\Sigma_{Y}\left(n,n+h\right)=\begin{cases} \sigma^{2}\sum_{i=0}^{2-h}\theta_{i}\theta_{i+h} & \left|h\right|\leq2\\ 0 & \left|h\right|>2 \end{cases} \]
Thus the autocorrelation function becomes, \[ \rho_{Y}(h)=\frac{\gamma(h)}{\gamma(0)}=\begin{cases} \frac{\sum_{i=0}^{2-h}\theta_{i}\theta_{i+h}}{\left(1+\theta_{1}^{2}+\theta_{2}^{2}\right)} & \text{ if }\left|h\right|\leq2\\ 0 & \text{ if }\left|h\right|>2 \end{cases} \]
(a)(ii) \(\mathbf{Solution.}\qquad\) We can expand the process \(Z_{t}\), \[\begin{align*} Z_{t} & =Y_{t}+Y_{t-1}\\ & =\left(\mu+\varepsilon_{t}+\theta_{1}\varepsilon_{t-1}+\theta_{2}\varepsilon_{t-2}\right)+\left(\mu+\varepsilon_{t-1}+\theta_{1}\varepsilon_{t-2}+\theta_{2}\varepsilon_{t-3}\right)\\ & =2\mu+\varepsilon_{t}+\left(1+\theta_{1}\right)\varepsilon_{t-1}+\left(\theta_{1}+\theta_{2}\right)\varepsilon_{t-2}+\theta_{2}\varepsilon_{t-3} \end{align*}\]
Then process above looks like an \(\textrm{MA}(3)\) process with the following coefficients, \[ \phi_{0}=1,\ \phi_{1}=\left(1+\theta_{1}\right),\ \phi_{2}=\left(\theta_{1}+\theta_{2}\right)\text{ and }\phi_{3}=\theta_{2} \]
Then applying results from part i) the autocovariance function is, \[ \gamma\left(h\right)=\Sigma_{Z}\left(n,n+h\right)=\begin{cases} \sigma^{2}\sum_{i=0}^{3-h}\phi_{i}\phi_{i+h} & \text{ if }\left|h\right|\leq3\\ 0 & \text{ if }\left|h\right|>3 \end{cases} \]
The autocorrelation function is, \[ \rho_{Z}(h)=\frac{\gamma(h)}{\gamma(0)}=\begin{cases} \frac{\sum_{i=0}^{3-h}\phi_{i}\phi_{i+h}}{\sum_{i=0}^{3}\phi_{i}^{2}} & \text{ if }\left|h\right|\leq3\\ 0 & \text{ if }\left|h\right|>3 \end{cases} \]
(b)(a) \(\mathbf{Solution.}\qquad\) We can write the autocovariance function \[\begin{align*} \gamma(k) & =\Sigma_{Y}(t,t+k)\\ & =\textrm{Cov}\left(Y_{t},Y_{t+k}\right)\\ & =\textrm{Cov}\left(Y_{t},\phi_{1}\left(Y_{t+k-1}-\mu\right)+\phi_{2}\left(Y_{t+k-2}-\mu\right)+\varepsilon_{t+k}\right)\\ & =\phi_{1}\textrm{Cov}\left(Y_{t},Y_{t+k-1}\right)+\phi_{2}\textrm{Cov}\left(Y_{t},Y_{t+k-2}\right)+\underbrace{\textrm{Cov}\left(Y_{t},\varepsilon_{t+k}\right)}_{=0}\\ & =\phi_{1}\gamma(k-1)+\phi_{2}\gamma(k-2) \end{align*}\]
So the autocorrelation function is, \[\begin{align*} \rho(k) & =\frac{\gamma(k)}{\gamma(0)}\\ & =\frac{\phi_{1}\gamma(k-1)+\phi_{2}\gamma(k-2)}{\gamma(0)}\\ & =\phi_{1}\rho(k-1)+\phi_{2}\rho(k-2) \end{align*}\]
(b)(b) \(\mathbf{Solution.}\qquad\) We know from part a) that, \[\begin{align} \rho(1) & =\phi_{1}\rho(1-1)+\phi_{2}\rho(1-2)\\ & =\phi_{1}+\phi_{2}\rho(1)\nonumber \\ \nonumber \\ \rho(2) & =\phi_{1}\rho(2-1)+\phi_{2}\rho(2-2)\\ & =\phi_{1}\rho(1)+\phi_{2}\nonumber \end{align}\]
The two equations above correspond to the system of equations.
(b)(c) \(\mathbf{Solution.}\qquad\) Since \(\rho(k)=\phi_{1}\rho(k-1)+\phi_{2}\rho(k-2)\), Then \(\rho(3)\) is, \[\begin{align*} \rho(3) & =\phi_{1}\rho(3-1)+\phi_{2}\rho(3-2)\\ & =\phi_{1}\rho(2)+\phi_{2}\rho(1) \end{align*}\]
With the R code below, we solve for \(\phi_1, \phi_2\) and \(\rho(3)\),
rho = matrix(c(0.4, 0.2), ncol = 1)
A = matrix(c(1, 0.4, 0.4, 1), ncol = 2)
phi = solve(A) %*% rho
phi
## [,1]
## [1,] 0.38095238
## [2,] 0.04761905
## [,1]
## [1,] 0.0952381
In CTools is data set of daily price data corresponding the Apple Stock from Jan \(1\) \(2000\) through Dec \(31\) \(2014\) - it is in file “apple_stock_day_2000_2014.csv
”.
Hint: May find the following R-commands helpful.
X = read.csv("apple_stock_day_2000_2014.csv",header=TRUE)
Appl_day <-X$Adj.Close
ApplLR <- diff(log(Appl_day)) # generating log returns (daily)
ApplLR.ts <- ts(ApplLR,start=c(2000,1),frequency=252,names=c("Apple_Return"))
ApplLR_2002_2003_sub.ts = window(ApplLR.ts, start=c(2002, 150), end=c(2003, 252))
(a) \(\mathbf{Solution.}\qquad\) First, in figure 1 we plot the daily log-returns for Apple stock from 2002(Day 150) - 2003. We see that overall the time series looks like it may be modeled well by a stationary model. We will generate ACF plot to check this assumption.
# QUESTION 2A ----------------------------------------------------------------
lrets = window(ApplLR.ts, start=c(2002, 150), end=c(2003, 252))
# Plot of log-returns
autoplot(lrets) +
ggtitle("AAPL Stock Log-Returns") +
xlab("Date") +
ylab("log-returns")
In figure 2 we see that there is very little serial correlation between observations.
Finally, performing Box-Ljung test on whether the autocorrelations are 0, we see that the p-value is not significant enough to reject this assumption, which agrees with the previous two figures.
##
## Box-Ljung test
##
## data: lrets
## X-squared = 9.3237, df = 10, p-value = 0.5017
(b) \(\mathbf{Solution.}\qquad\) Using auto.arima()
function, we find that it chooses ARIMA(0,0,0) with no drift as the best model according to AIC. By checking the diagnostic plots in figure 3, we see from the ACF of residuals that they are close to white noise and the plot showing p-values for Ljung-Box statistic confirm this since none are significant.
# QUESTION 2B ----------
# Best fit is a white noise process ARIMA(0,0,0)
library(forecast)
fit = auto.arima(lrets, max.p=5, max.q=5, ic="aic")
fit
## Series: lrets
## ARIMA(0,0,0) with zero mean
##
## sigma^2 estimated as 0.0005436: log likelihood=830.59
## AIC=-1659.18 AICc=-1659.17 BIC=-1655.31
In figure 4 we check the normality assumption of the residuals. We see that the residuals appear to be wider-tailed than normal, thus we conclude that the residuals are not normally distributed.
# Also show qq-plot of residuals. Residuals do not look normal
res = data.frame(residuals = fit$residuals)
ggplot(res, aes(sample = residuals)) +
stat_qq(colour = "darkred", shape = 1) +
stat_qq_line(colour = "steelblue") +
ggtitle("Normal Plot of ARIMA(0,0,0) Residuals") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab('Theoretical Quantiles') +
ylab('Sample Quantiles')
(c) \(\mathbf{Solution.}\qquad\) The code below generates 10-day forecast using our fitted ARIMA(0,0,0) model from part b). From figure 5. The predicted model implies that predicted Adj. Close price will be the same for each day in our 10-day forecast. We also show 95% intervals of the close price.
# QUESTION 2C ----------
days_ahead = 10
# Most of the code below involves setting up a dataframe for plotting the
# predictions
X$Date = as.Date(X$Date,format="%m/%d/%Y")
# Filter Data from day 150 of 2002 to end of 2003
time_ind = year(X$Date) >= 2002 & year(X$Date) <= 2003
df1 = X[time_ind,]
df1 = df1[150:nrow(df1),]
# Get predictions
fit = arima(log(df1$Adj.Close), order = c(0, 1, 0))
forecasts = predict(fit, days_ahead)
forecasts
## $pred
## Time Series:
## Start = 356
## End = 365
## Frequency = 1
## [1] 0.3646431 0.3646431 0.3646431 0.3646431 0.3646431 0.3646431 0.3646431
## [8] 0.3646431 0.3646431 0.3646431
##
## $se
## Time Series:
## Start = 356
## End = 365
## Frequency = 1
## [1] 0.02322630 0.03284695 0.04022913 0.04645260 0.05193559 0.05689259
## [7] 0.06145102 0.06569390 0.06967891 0.07344802
# Build temporary dataframe to append to previous dataframe
first_n_days = X[year(X$Date) == 2004,] %>%
top_n(-days_ahead, wt = Date) %>%
select(Date)
df2 = data.frame(matrix(NA, nrow = days_ahead, ncol = length(X)))
colnames(df2) = colnames(X)
df2["Date"] = first_n_days
df = rbind(df1, df2)
# Calculate the n-day forecasts for Adj Close Price
last_price = df1[nrow(df1), "Adj.Close"]
lower = exp(forecasts$pred - 1.96*forecasts$se)
upper = exp(forecasts$pred + 1.96*forecasts$se)
predicted = exp(forecasts$pred)
# Add forecasts to our df
df$predicted = c(rep(NA, nrow(df1)), predicted)
df$lower = c(rep(NA, nrow(df1)), lower)
df$upper = c(rep(NA, nrow(df1)), upper)
# Plot n-step prediction
ggplot(df, aes(x=Date)) +
geom_line(aes(y=Adj.Close)) +
geom_line(aes(y=predicted), colour = "darkred") +
geom_line(aes(y=upper), colour = "steelblue") +
geom_line(aes(y=lower), colour = "steelblue") +
ggtitle("10-step ahead prediction AAPL Adj Close Price")
(d) \(\mathbf{Solution.}\qquad\) Below we compute 1-day ahead \(\textrm{VaR}_{0.005}\). This estimate is likely an underestimate, because as we saw from the qq-plot in figure 4, the residuals were heavy-tailed on both tails and not normally distributed.
# QUESTION 2D ----------
# We can use our lower estimate of the price from part c) to compute return
alpha = 0.005
pred_low_price = exp(forecasts$pred[1] + qnorm(alpha)*forecasts$se[1])
VaR = -1e6 * (pred_low_price / last_price - 1)
VaR
## [1] 58072.52