Exercise 2 on page 81 Column seven of the data set RecentFord.csv
on the book’s web site contains Ford daily closing prices, adjusted for splits and dividends, for the years \(2009-2013 .\) Repeat Problem 1 using these more recent returns. One of returns is approximately \(-0.175\). For part \((\mathrm{d}),\) use that return in place of Black Monday. (Black Monday, of course, is not in this data set.) On what date did this return occur? Search the Internet for news about Ford that day. Why did the Ford price drop so precipitously that day?
Recall Problem 1:
This problem uses the data set ford.csv
on the book’s web site. The data were taken from the ford.s
data set in R
’s fEcofin
package. This package is no longer on CRAN. This data set contains 2000 daily Ford returns from January 2,1984, to December 31,1991.
The data set RecentFord.csv is in Data directory under Files in Canvas. Note: The normal plot and t-plots correspond to QQ plots relative to those distributions.
(a) \(\mathbf{Solution.}\qquad\) Below we compute the relevant statistics from the data,
# Read Data
data = read.csv("RecentFord.csv")
# Compute Returns
log_ret = diff(log(data$Adj.Close))
ret = exp(log_ret) - 1
# Add new columns to dataframe for plotting
data["Ret"] = c(NA, ret)
data["Log_Ret"] = c(NA, log_ret)
df = data.frame("Mean" = mean(ret), "Median" = median(ret),
"Std. Dev" = sd(ret))
kable(df, digits = 3, format.args = list(big.mark = ","),
caption = "Summary Statistics for Ford Returns",
col.names = c("Mean", "Median", "Std. Dev"), escape = FALSE) %>%
kable_styling(latex_options = c("striped", "hold_position"))
Mean | Median | Std. Dev |
---|---|---|
0.002 | 0.001 | 0.026 |
(b) \(\mathbf{Solution.}\qquad\) Below is the R code to construct a normal plot of the Ford returns.
ggplot(data = subset(data, !is.na(data$Ret)), aes(sample = Ret)) +
stat_qq(colour = "darkred", shape = 1) +
stat_qq_line(colour = "steelblue") +
ggtitle("Normal Plot of Returns") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab('Theoretical Quantiles') +
ylab('Sample Quantiles')
The Ford returns do not look normally distributed, because the left end and right end of the QQ plot deviate away from the theoretical quantiles. In other words, returns are much more heavy tailed in comparison to the normal distribution.
(c) \(\mathbf{Solution.}\qquad\) According to Shapiro-Wilk test, its \(p\)-value is 1.504037210^{-23}, so we can certainly reject the null-hypothesis at the 1% level that the distribution is normally distributed.
(d) \(\mathbf{Solution.}\qquad\) To create \(t\)-plots we will use the following function qqt
. Note that we construct the reference line by interpolating between the 25th and 75th quantiles. We could have also done linear regression on the points between these two quantiles.
# Function for generating t-plot
qqt = function(x, nu){
xsort = sort(x)
n = length(xsort)
q = (1 / (n + 1)) * c(1:n)
qq = qt(q, nu)
# Construct dataframe to plot in ggplot
df = data.frame("quantiles" = qq, "data" = xsort)
# Compute theoretical quantiles
xx = quantile(qq, c(0.25, 0.75))
yy = quantile(xsort, c(0.25, 0.75))
slope = (yy[2] - yy[1]) / (xx[2] - xx[1])
inter = yy[1] - slope*xx[1]
g = ggplot(df, aes(x = quantiles, y = data)) +
geom_point(colour = "darkred", shape = 1) +
ggtitle(TeX(paste0("t-Plot $\\nu$= ", as.character(nu)))) +
theme(text=element_text(size=7),
plot.title = element_text(hjust = 0.5)) +
geom_abline(intercept = inter, slope = slope, colour = "steelblue") +
xlab('Theoretical Quantiles') +
ylab('Sample Quantiles')
return(g)
}
We show several subplots with degrees of freedom ranging from \(\{2,\ldots ,10\}\). The R-code for constructing the \(t\)-plots is shown below.
# Construct t-distribution qq plots
plot_list = list()
nu_vals = c(2:10)
iter = 1
for (nu in nu_vals) {
plot_list[[iter]] = local({
nu = nu
p1 = qqt(ret, nu)
})
iter = iter + 1
}
# plot all subplots
n <- length(plot_list)
nCol <- ceiling(sqrt(n))
grid.arrange(grobs = plot_list, ncol = nCol,
top = textGrob(TeX("t-Plots for $t(\\nu)$ of Ford Returns"),
gp = gpar(fontsize = 13, font = 8)))
By inspection, it seems that the \(t\)-plot with \(3\) degrees of freedom produces the straightest line. When sorting the data, we see that the lowest return occurred on May 12, 2009.
## Date Ret
## 90 5/12/2009 -0.1748252
Below we test this observation using \(-1.5\times IQR\) as a proxy to determine whether it is an outlier.
max_loss = ordered_ret$Ret[1]
is_outlier = max_loss < -1.5*IQR(ordered_ret$Ret, na.rm = TRUE)
is_outlier
## [1] TRUE
Indeed Ford shares dropped markedly on this day after “disclosing a public offering of 300 million shares of common stock that will help it fund its health care trust for retired autoworkers and their families” (https://www.mercurynews.com/2009/05/12/ford-shares-sink-on-share-offering/). Since this type of event occurs rarely, we can ignore this return.
(e) \(\mathbf{Solution.}\qquad\) First we construct a KDE of the returns using the following R code. We will use the default setting for the KDE since lowering the bandwidth will fit too much of the noice in the histogram. Increasing the bandwidth will smooth the curve, but won’t be able to capture the peak well.
# Create dataframe in order to plot multiple densities
dens = density(ret)
df = data.frame("x" = dens$x, "y" = dens$y)
# Plot each density along with histogram
ggplot(df, aes(x = x, y = y, color = "darkred")) +
geom_line() +
geom_histogram(data = data, aes(x = Ret, y = ..density..), inherit.aes = FALSE,
binwidth = 1 / (0.2 * nrow(data)), alpha = .5) +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "none") +
ggtitle("Histogram of Ford Returns with KDEs") +
xlab('Returns') +
ylab('Density')
Formula (4.3) gives us the variance of the \(q\)th sample quantile when \(n\) is large. \[\frac{q(1-q)}{n\left[f\left\{F^{-1}(q)\right\}\right]^{2}}\] Using (4.3), the standard error formula of the sample median becomes, \[{SE}_{0.5} = \sqrt{\frac{{0.5}^2}{n\left[\hat{f}\left\{F^{-1}(q)\right\}\right]^{2}}}\] From our KDE, we can estimate \(\hat{f}\left\{F^{-1}(q)\right\}\). In the R code below, we compute the standard errors of the sample median and sample mean.
# Get estimated f(F^-1(q))
f = df[ceiling(nrow(df) / 2), ]$y
n = length(ret)
# Standard Error for Sample Median
std_med = sqrt((.5^2) / (n * f^2))
# Standard Error for Sample Mean
std_mean = sd(ret) / sqrt(n)
We find that, \[\hat{f}\left\{F^{-1}(0.5)\right\}=18.112\]
Then the SE of sample median is, \[{SE}_{0.5}=0.000779\]
Likewise, the SE of sample mean is, \[{SE}(\hat{\mu})=0.000745\]
We can see that both standard error estimates are quite close.
Exercise 5 on page 82 Let diffbp
be the changes (that is, differences) in the variable bp
, the
U.S. dollar to British pound exchange rate, which is in the Garch
data set of R
’s Ecdat
package.
diffbp
and in each plot add a reference line that goes through the \(p-\) and \((1-p)\) -quantiles, where \(p=0.25,0.1,0.05,0.025,0.01,\) and \(0.0025,\) respectively, for the six plots. Create a second set of six normal plots using \(n\) simulated \(N(0,1)\) random variables, where \(n\) is the number of changes in bp
plotted in the first figure. Discuss how the reference lines change with the value of \(p\) and how the set of six different reference lines can help detect nonnormality.bp
. Do the changes in log(bp)
look closer to being normally distributed than the changes in bp
?(a) \(\mathbf{Solution.}\qquad\) For the first set of \(3 \times 2\) plots, we display normal plots of diffbp
with different reference lines interpolated between \(p-\) and \((1-p)\) -quantiles.
library(Ecdat)
# i. Create 6 normal plots diffbp
data = Ecdat::Garch
diffbp = diff(data$bp)
data["diffbp"] = c(NA, diffbp)
# Construct subplots
plot_list = list()
p_vals = c(0.25, 0.1, 0.05, 0.025, 0.01, 0.0025)
# Iterate through each value of p
iter = 1
for (p in p_vals) {
plot_list[[iter]] = local({
p = p
# Get slope and intercept of reference line
xx = qnorm(c(p, 1 - p))
yy = quantile(diffbp, c(p, 1 - p))
slope = (yy[2] - yy[1]) / (xx[2] - xx[1])
inter = yy[1] - slope*xx[1]
# Create plot
p1 = ggplot(data = subset(data, !is.na(data$diffbp)),
aes(sample = diffbp)) +
stat_qq(colour = "darkred", shape = 1) +
geom_abline(intercept = inter, slope = slope, colour = "steelblue") +
ggtitle(paste0("Normal Plot with p = ", as.character(p))) +
theme(plot.title = element_text(size = 10, hjust = 0.5))
})
iter = iter + 1
}
# plot all subplots
grid.arrange(grobs = plot_list, ncol = 2,
top = textGrob("3 × 2 Matrix of Normal Plots of diffbp",
gp = gpar(fontsize = 13, font = 8)))
Next, we display the second set of six normal plots, but using \(n\) simulated \(N(0,1)\) random variables.
# ii. Create 6 normal plots for normally distributed data
set.seed(1234)
randn = rnorm(length(diffbp))
data$randn = c(NA, randn)
plot_list = list()
# Iterate through each value of p
iter = 1
for (p in p_vals) {
plot_list[[iter]] = local({
p = p
# Get slope and intercept of reference line
xx = qnorm(c(p, 1 - p))
yy = quantile(randn, c(p, 1 - p))
slope = (yy[2] - yy[1]) / (xx[2] - xx[1])
inter = yy[1] - slope*xx[1]
# Create plot
p1 = ggplot(data = subset(data, !is.na(data$randn)),
aes(sample = randn)) +
stat_qq(colour = "darkred", shape = 1) +
geom_abline(intercept = inter, slope = slope, colour = "steelblue") +
ggtitle(paste0("Normal Plot with p = ", as.character(p))) +
theme(plot.title = element_text(size = 11, hjust = 0.5))
})
iter = iter + 1
}
# plot all subplots
grid.arrange(grobs = plot_list, ncol = 2,
top = textGrob(TeX("3 × 2 Matrix of Normal Plots of $N(0,1)$"),
gp = gpar(fontsize = 13, font = 8)))
On the first set of plots, we notice that when we change \(p\), the slope of the reference line increases. The plot that looks the straightest is the last one with \(p=0.0025\). The lines on the second set of plots all look identical. Thus we can detect nonnormality based on how much we have to decrease the value of \(p\) in order to get a straight-looking plot. In this case, we required a very low value of \(p\) to get normal plot of diffbp
to look straight, indicating probable nonnormality.
(b) \(\mathbf{Solution.}\qquad\) Below we construct a third set of normal plots using changes in log(bp)
.
diff_lbp = diff(log(data$bp))
data$diff_lbp = c(NA, diff_lbp)
# Iterate through each value of p
iter = 1
for (p in p_vals) {
plot_list[[iter]] = local({
p = p
# Get slope and intercept of reference line
xx = qnorm(c(p, 1 - p))
yy = quantile(diff_lbp, c(p, 1 - p))
slope = (yy[2] - yy[1]) / (xx[2] - xx[1])
inter = yy[1] - slope*xx[1]
# Create plot
p1 = ggplot(data = subset(data, !is.na(data$diff_lbp)),
aes(sample = diffbp)) +
stat_qq(colour = "darkred", shape = 1) +
geom_abline(intercept = inter, slope = slope, colour = "steelblue") +
ggtitle(paste0("Normal Plot with p = ", as.character(p))) +
theme(plot.title = element_text(size = 10, hjust = 0.5))
})
iter = iter + 1
}
# plot all subplots
grid.arrange(grobs = plot_list, ncol = 2,
top = textGrob(TeX("3 × 2 Matrix of Normal Plots of $\\Delta \\log(bp)$"),
gp = gpar(fontsize = 13, font = 8)))
We notice that none of the individual plots look straight, even though the last plot with \(p=0.0025\) would be considered the stragightest out of all other plots. Here, changes in log(bp)
look less close to being normally distributed than changes in bp
.
(a) Derive the limiting distribution of \(\mathrm{GPD}(\mu, \xi, \sigma)\) for case where \(\mu=0, \sigma>0\) and \(\xi \rightarrow 0 .\) Hint: With \(F\) being the cdf, focus on the limit of the tail probability \([1-F(x)]\) Also, can use the standard result in mathematics that \[ \lim _{M \rightarrow \infty}\left(1+\frac{\beta}{M}\right)^{M}=e^{\beta} \]
(b) Based on result in (a), what can you say about using the Pareto distribution to model the tails of a normal distribution?
(a) \(\mathbf{Solution.}\qquad\) First we start by computing \(\lim_{\xi\rightarrow0}(1-F(x))\), \[ \lim_{\xi\rightarrow0}(1-F(x))=\lim_{\xi\rightarrow0}\left(1+\frac{\xi x}{\sigma}\right)^{-1/\xi} \]
Let \(\frac{1}{\xi}=M,\) then, \[\begin{align*} \lim_{\xi\rightarrow0}(1-F(x)) & =\lim_{M\rightarrow\infty}\left[\left(1+\frac{x}{\sigma M}\right)^{M}\right]^{-1}\\ & =\left[\lim_{M\rightarrow\infty}\left(1+\frac{x}{\sigma M}\right)^{M}\right]^{-1}\\ & =e^{-\frac{x}{\sigma}} \end{align*}\]
We were able to pass the limit inside of the reciprocal function, because the reciprocal function is monotonic decreasing on the interval \((0,\infty)\).
(b) \(\mathbf{Solution.}\qquad\) From part a) we know that, \[ \lim_{\xi\rightarrow0}(1-F(x))=e^{-x/\sigma}\sim e^{-x} \]
This means that by letting the shape parameter \(\xi\rightarrow0\), the tail probabilities will, at most, decrease exponentially fast. However, it is not well suited to model tails of a normal distribution since tails of normal decrease at a faster rate of \(e^{-x^{2}}\).