STATS 509: Homework 3

Question 1

The Nasdaq index data set from Jan 1, 2008 to Dec 31, 2012 is in the data set Nasdaq_daily_Jan1_2008_Dec31 2012.csv in Data folder on Canvas.

Fit a Generalized Pareto Distribution (GPD for short) to the upper tail of the negative returns and give detailed plots of the fit (along the lines of what is generated in class) and discuss your results.
Compute the relative VaR (expressed in units of the current price) at the level $=.0025$ utilizing for the case of fitting a normal distribution to the returns (not using any POT).
Compute VaR for 1 million dollars invested in Ford stock at the level $=.0025$ utilizing the GPD distributional fits generated in part (a). Discuss the comparison of this result from part (b).
Utilizing “quant” command in evir package, give a discussion on the stability of the VaR as function of threshold.
Compute expected shortfall for part (c) using Monte Carlo methods and the estimated GPD distribution and relative VaR in (c).

(a) $\mathbf{Solution.}\qquad$ For this homework we attempt to recreate most plots from evir library using ggplot2. This will also be useful for creating plots in Question 2. The code is adapted directly from the functions in evir. The code for the custom ggplot functions can be found on the last page. First we build the empirical CDF of the negative returns for the Nasdaq index.

# QUESTION 1A -----------------------------------------------------------------
FACTOR = 1

data = read.csv("Nasdaq_daily_Jan1_2008_Dec31_2012.csv", header = TRUE)
ret = (exp(diff(log(data$Adj.Close))) - 1) * FACTOR
data["Returns"] = c(NA, ret)

# Plot empirical CDF
eecdf = ecdf(-ret)
uv = seq(from = .01, to = .1 * FACTOR, by = .001 ) # Domain of CDF to show
df = data.frame("x" = uv, "y" = eecdf(uv))

# Generate Empirical CDF of negative returns
ggplot(df, aes(x = x, y = y)) + 
  geom_line(colour = "steelblue") + 
  xlab('x') + 
  ylab("Fhat(x)") + 
  ggtitle("Empirical CDF of negative Nasdaq Returns") + 
  theme(plot.title = element_text(hjust = 0.5, size = 10))

Next we create the shape plot from which we will determine a suitable shape paramter $\xi$ for our GPD.

# Generate Shape Plot
gg_shape(-ret, models = 30, start = 40, end = 600, 
         reverse = TRUE, auto.scale = 'FALSE')

# Select the threshold and estimate negative returns by GPD
u = 1.21 / 100 * FACTOR
gpd_est = gpd(-ret, threshold = u, method = c("ml"), 
              information = c("observed"))

From the shape plot above, we determine that a proper threshold is about 0.0121. After having chosen this threshold, we can now generate the tailplot to see how well the points in the tail are fitted to our estimated GPD.

# Generate Tail Plot
gg_tailplot(gpd_est)

beta = gpd_est$par.ests[2]
xi = gpd_est$par.ests[1]

We can see that that our fitted GPD does a good job of fitting the points, with scale parameter $\hat{\beta}=$ 0.013 and shape paramter $\hat{\xi}=$ 0.035.

(b) $\mathbf{Solution.}\qquad$ Below we fit a normal distribution to the returns and estimate the relative VaR.

# QUESTION 1B -----------------------------------------------------------------
# Estimate the (1-alpha) quantile of negative returns using Normal Distribution
alpha = 0.0025
r_VaR = -qnorm(alpha, mean = mean(ret), sd = sd(ret))

The 0.0025-quantile of the returns is -0.0482 so the relative VaR is 0.0482.

(c) $\mathbf{Solution.}\qquad$ Below we get our VaR estimate using our fitted GPD from problem (a).

# QUESTION 1C -----------------------------------------------------------------
# Get estimate from empirical CDF
eecdf = ecdf(-ret)
alphat = 1 - alpha / (1 - eecdf(u))

# Calculate VaR using GPD estimate for tail
capital = 1e6
VaRt = qgpd(alphat, xi, u, beta)

VaR = capital * VaRt

VaRt

##       beta 
## 0.06946913

For the GPD case, our relative VaR is 0.0695 which is higher than the relative VaR estimated by the normal distribution in part (b). Viewing the qq-plot below for Nasdaq returns, we see that the left tail is more heavy-tailed than predicted by a normal distribution. Thus, we can see that the normal distribution underestimates VaR and our fitted GPD provides a better estimate.

ggplot(data = subset(data, !is.na(data$Returns)), aes(sample = Returns)) +
  stat_qq(colour = "darkred", shape = 1) + 
  stat_qq_line(colour = "steelblue") +
  ggtitle("Normal Plot of Nasdaq Returns") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  xlab('Theoretical Quantiles') + 
  ylab('Sample Quantiles')

(d) $\mathbf{Solution.}\qquad$ Using the quant command from evir library, in the graph below we see that our VaR estimate is stable as we vary the threshold. We find that VaR ranges from [0.0687], [0.0713] which is only a difference of 0.0026.

# QUESTION 1D -----------------------------------------------------------------
# Plot VaR estimates as a function of threshold
quant_rslt = quant(-ret, p = 1 - alpha, models = 20, start = 600, end = 40, 
          reverse = TRUE, ci = FALSE, auto.scale = TRUE, labels = TRUE)

quantile_range = c(min(quant_rslt["qest",]), max(quant_rslt["qest",]))

(e) $\mathbf{Solution.}\qquad$ Below we compute the ES according to our fitted GPD.

# QUESTION 1E -----------------------------------------------------------------
# Generate random samples from our GPD
draws = 2*1e8
gpd_rets = rgpd(draws, xi, u, beta)

# Take Average of negative returns exceeding VaR
ES = mean(gpd_rets[gpd_rets > VaRt])

# Get shortfall from ES
shortfall = capital * ES

We find that $ES=$ 0.0846 so the shortfall is $84640.74.

Question 2

Redo problems 1 (a), (c), (d), and (e) utilizing an exponential distribution. In particular for part (a) you do not need to consider multiple thresholds, but rather fit an exponential distribution to the tail, based on the one threshold you utilized in problem 1, and provide tail-fit plot and discussion about goodness of fit of this model. Parts (c),(d), and (e) should be done in a totally similar fashion as you did for problem 1. Provide a discussion on the differences/similarities in the results for Problem 1 and 2, and whether they match what would have been expected based on the estimated GPD shape parameter in problem 1.

(a) $\mathbf{Solution.}\qquad$ Below we construct the tail plot for the negative Nasdaq returns using an exponential distribution.

# QUESTION 2A -----------------------------------------------------------------
# Filter the returns greater than threshold
ret2 = data %>% filter(Returns < -u) %>% select(Returns) %>% select(1)
ret2 = -ret2[,1] - u

# Choose value of lambea which we know from MLE of exponential
lambda = 1 / mean(ret2)

# We implement code sourced from tailplot function, but using exponential 
# distribution instead
extend = 1.2
temp = ret2 + u
threshold = u
plotmin = threshold
plotmax = max(temp) * extend
xx <- seq(from = 0, to = 1, length = 1000)
z <- qexp(xx, rate = lambda)
z <- pmax(pmin(z, plotmax), plotmin)
ypoints = ppoints(sort(temp))
y = pexp(z - threshold, rate = lambda)
prob <- gpd_est$p.less.thresh
ypoints <- (1 - prob) * (1 - ypoints)
y <- (1 - prob) * (1 - y)
yylab <- "1-F(x)"

df = data.frame("sortdata" = sort(temp), "ypts" = ypoints)
df2 = data.frame("x" = z[y >= 0] , "y" = y[y >= 0])
ggplot(df, aes(x = sortdata, y = ypts)) +
  geom_point(shape = 1, size = 2, colour = "darkred") + 
  scale_y_log10() + 
  scale_x_continuous(breaks = scales::pretty_breaks(n = 7),
                     trans = 'log10') + 
  geom_line(data = df2, aes(x = x, y = y), colour = "steelblue", 
            inherit.aes = FALSE) + 
  ggtitle(paste0("Tail Plot of Exponential with u = ", u)) + 
  xlab('x (on log scale)') + 
  ylab('1-F(x) (on log scale)') + 
  theme(plot.title = element_text(hjust = 0.5))

From the tail plot above it seems that exponential distribution with $\lambda$ = 76.62 slightly underestimates the upper tail of negative returns. In contrast to our GPD distribution, we belive that GPD provides a better fit.

(c) $\mathbf{Solution.}\qquad$ Below we get our VaR estimate using our fitted exponential distribution from problem (a).

# QUESTION 2C -----------------------------------------------------------------
# Get estimate from empirical CDF
eecdf = ecdf(-ret)
alphat = 1 - alpha / (1 - eecdf(u))

# Calculate VaR using exponential distribution estimate for tail
capital = 1e6
VaRt_exp = qexp(alphat, rate = lambda) + u

VaR_exp = capital * VaRt_exp

VaRt_exp

## [1] 0.06725146

For the exponential case, our relative VaR is 0.0673 and VaR is $67251.46 which is lower than the relative VaR estimated by the GPD in part 1 (c) by 0.0022 units.

(d) $\mathbf{Solution.}\qquad$ Below we compute the VaR as a function of threshold.

# QUESTION 2D -----------------------------------------------------------------
thresh_vals = seq(-0.0001, 0.02, length = 100)

# Return corresponding VaR from fitting exponential distribution depending on 
# threshold
quant_exp = function(u) {
  ret2 = data %>% filter(Returns < -u) %>% select(Returns) %>% select(1)
  ret2 = -ret2[,1] - u
  
  # Choose value of lambda which we know from MLE of exponential
  lambda = 1 / mean(ret2)
  
  # Estimate Empirical CDF
  eecdf = ecdf(-ret)
  alphat = 1 - alpha / (1 - eecdf(u))
  
  
  VaRt_exp = qexp(alphat, rate = lambda) + u
  return(VaRt_exp)
}

# Get quantiles for each threshold
quant_vars = sapply(thresh_vals, quant_exp)

# Plot Quantiles as function of threshold
df = data.frame("thresh" = thresh_vals, "quantiles" = quant_vars)
ggplot(df, aes(x = thresh, y = quantiles)) +
  geom_line(colour = "steelblue") + 
  ylab(paste0(1 - alpha, " Quantile")) + 
  xlab("Threshold") + 
  ggtitle("Quantile Plot of Exponential vs Threshold") + 
  theme(plot.title = element_text(hjust = 0.5))

# Get range of our estimated quantiles
quantile_range2 = c(min(quant_vars), max(quant_vars))

In the graph above we see that our VaR estimate is stable as we vary the threshold. We find that VaR ranges from 0.0642, 0.0696 which is a difference of 0.0053. This range falls below the range we estimated in our GPD.

(e) $\mathbf{Solution.}\qquad$ Below we compute the expected shortfall using Monte Carlo methods and our exponential distribution.

# QUESTION 2E -----------------------------------------------------------------
# Generate random samples from our exponential distribution
draws = 2*1e8
exp_rets = rexp(draws, rate = lambda)

# Take Average of negative returns exceeding VaR
ES_exp = mean(exp_rets[exp_rets > VaRt_exp - u] + u)

# Get shortfall from ES
shortfall_exp = capital * ES_exp

We find that $ES=$ 0.0803 so the shortfall is $80296.14. This $ES$ is lower than the $ES$ predicted by GPD by 0.0043 units

Question 3

Problem 4 on page 132 in Ruppert/Matteson Let $X$ be a random variable with mean $\mu$ and standard deviation $\sigma$.

Show that the kurtosis of $X$ is equal to 1 plus the variance of $\{(X-\mu) / \sigma\}^{2}$
Show that the kurtosis of any random variable is at least 1.
Show that a random variable $X$ has a kurtosis equal to 1 if and only if $P(X=a)=P(X=b)=1 / 2$ for some $a \neq b$.

(a) $\mathbf{Solution.}\qquad$ Let $Z=\frac{X-\mu}{\sigma},$ then we know that, \[\begin{align*} \mathbb{E}[Z] & =\frac{1}{8}(\mathbb{E}[X]-\mu)=0\\ \operatorname{Var}(Z) & =\frac{1}{\sigma^{2}}(\operatorname{Var}(X))=1 \end{align*}\]

Since $\operatorname{Var}\left(Z^{2}\right)=\mathbb{E}\left[Z^{4}\right]-\mathbb{E}\left[Z^{2}\right]^{2}$, then, \[\begin{align*} \operatorname{Kur}(X) & =\mathbb{E}\left[Z^{4}\right]\\ & =\operatorname{Var}\left(Z^{2}\right)+\mathbb{E}\left[Z^{2}\right]^{2}\\ & =\operatorname{Var}\left(Z^{2}\right)+(1)^{2}\\ & =\operatorname{Var}\left(Z^{2}\right)+1 \end{align*}\]

(b) $\mathbf{Solution.}\qquad$ From question 3 (a) we showed that the $\mathrm{Kur}\left(X\right)=\operatorname{Var}\left(Z^{2}\right)+1$. Since $\operatorname{Var}\left(Z^{2}\right)\in[0,\infty)$, then the smallest $\mathrm{Kur}\left(X\right)$ can be is 1.

(c) $\mathbf{Solution.}\qquad$ ($\implies$) First, if $X$ is a random variable where $\mathrm{Kur}\left(X\right)=1$, then question 3 (a) implies, \[\begin{align} 1 & =\operatorname{Var}\left\{ Z^{2}\right\} +1\nonumber \\ 0 & =\operatorname{Var}\left\{ \left(\frac{X-\mu}{\sigma}\right)^{2}\right\} \nonumber \\ 0 & =\frac{1}{\sigma^{4}}\operatorname{Var}\left(X^{2}-2\mu X+\mu^{2}\right)\nonumber \\ 0 & =\operatorname{Var}\left(X^{2}-2\mu X\right)\nonumber \\ 0 & =\mathbb{E}\left[\left(\left(X^{2}-2\mu X\right)-\mathbb{E}\left[\left(X^{2}-2\mu X\right)\right]\right)^{2}\right]\label{eq:1} \end{align}\]

Equation (\ref{eq:1}) implies that $X^{2}-2\mu X=\mathbb{E}\left[X^{2}-2\mu X\right]$. Then we have, \[\begin{align*} X^{2}-2\mu X & =\mathbb{E}\left[X^{2}-2\mu X\right]\\ & =\mathbb{E}\left[X^{2}\right]-2\mu\mathbb{E}\left[X\right]\\ & =\sigma^{2}+\mu^{2}-2\mu^{2}\\ & =\sigma^{2}-\mu^{2} \end{align*}\]

Moving all terms to the left, \[\begin{align} X^{2}-2\mu X-\left(\sigma^{2}-\mu^{2}\right) & =0\nonumber \\ \left(X-(\sigma+\mu)\right)\left(X+(\sigma-\mu)\right) & =0\label{eq:2} \end{align}\]

To satisfy equation (\ref{eq:2}), we need $X=\left(\sigma+\mu\right)\text{ or }\left(\mu-\sigma\right)$. We have shown that $X$ can only take on two values. Next we need to show that the two outcomes of $X$ occur with equal probability. In this case we define the probabilities as, $p:=\mathbb{P}\left(X=\sigma+\mu\right)$ and $\left(1-p\right):=\mathbb{P}\left(X=\mu-\sigma\right)$. By computing $\mathbb{E}\left[X\right]$, \[\begin{align*} \mathbb{E}\left[X\right] & =p(\sigma+\mu)+(1-p)(\mu-\sigma)\\ \mu & =p\sigma+p\mu+\mu-\sigma-p\mu+p\sigma\\ \sigma & =2p\sigma\\ \frac{1}{2} & =p \end{align*}\]

Thus we’ve proven the claim. Next we show the opposite direction.

($\Longleftarrow$) Given $X$ is a random variable where, \[ X=\begin{cases} a & \text{with }p=\frac{1}{2}\\ b & \text{with }\left(1-p\right)=\frac{1}{2} \end{cases} \]

Then the moments of $X$ are, \[ \mathbb{E}[X]=\frac{1}{2}(a+b) \]

and, \[\begin{align*} \operatorname{Var}(X) & =\frac{1}{2}\left(a^{2}+b^{2}\right)-\left(\frac{1}{2}(a+b)\right)^{2}\\ & =\frac{1}{2}a^{2}+\frac{1}{2}b^{2}-\frac{1}{4}\left(a^{2}+2ab+b^{2}\right)\\ & =\frac{1}{4}\left(a^{2}-2ab+b^{2}\right)\\ & =\frac{1}{4}(a-b)^{2} \end{align*}\]

From question 3 (a) we know, $\mathrm{Kur}(X)=1+\operatorname{Var}\left(Z^{2}\right)$. We will prove the claim by showing that $\operatorname{Var}\left(Z^{2}\right)=0$ using direct computation. \[\begin{align} \operatorname{Var}\left(Z^{2}\right) & =\operatorname{Var}\left[\left(\frac{X-\frac{1}{2}\left(a+b\right)}{\frac{1}{2}\left(a-b\right)}\right)^{2}\right] \\ & =\frac{16}{(a-b)^{2}}\operatorname{Var}\left(X^{2}-(a+b)X\right)\nonumber \\ & =\frac{16}{(a-b)^{2}}\left[\operatorname{Var}\left(X^{2}\right)+(a+b)^{2}\operatorname{Var}\left(X\right)-2\left(a+b\right)\operatorname{Cov}\left(X^{2},X\right)\right]\label{eq:3} \end{align}\]

We solve for the terms in equation (\ref{eq:3}) in two parts. First the variance terms, \[\begin{aligned}\operatorname{Var}\left(X^{2}\right)+(a+b)^{2}\operatorname{Var}(X) & =\mathbb{E}\left[X^{4}\right]-\mathbb{E}\left[X^{2}\right]^{2}+(a+b)^{2}\left(\frac{1}{4}(a-b)^{2}\right)\\ & =\frac{1}{2}\left(a^{4}+b^{4}\right)-\frac{1}{4}\left(a^{2}+b^{2}\right)^{2}+\frac{1}{4}(a+b)^{2}(a-b)^{2}\\ & =\frac{1}{4}\left(a^{4}-2a^{2}b^{2}+b^{4}\right)+\frac{1}{4}(a+b)^{2}(a-b)^{2}\\ & =\frac{1}{4}(a+b)^{2}(a-b)^{2}+\frac{1}{4}(a+b)^{2}(a-b)^{2}\\ & =\frac{1}{2}(a+b)^{2}(a-b)^{2} \end{aligned} \]

the second part involves computing the covariance term, \[\begin{align*} -2(a+b)\mathrm{Cov}\left(X^{2},X\right) & =-2(a+b)\mathbb{E}\left[\left(X^{2}-\mathbb{E}\left[X^{2}\right]\right)\left(X-\mathbb{E}\left[X\right]\right)\right]\\ & =-2(a+b)\mathbb{E}\left[X^{3}-X^{2}\mathbb{E}[X]-X\mathbb{E}\left[X^{2}\right]+\mathbb{E}\left[X^{2}\right]\mathbb{E}\left[X\right]\right]\\ & =-2(a+b)\bigg[\frac{1}{2}\left(a^{3}+b^{3}\right)-\frac{1}{2}\left(a^{2}+b^{2}\right)\frac{1}{2}(a+b)\\ & \quad+\underbrace{-\frac{1}{2}(a+b)\frac{1}{2}\left(a^{2}+b^{2}\right)+\frac{1}{2}\left(a^{2}+b^{2}\right)\frac{1}{2}(a+b)}_{=0}\bigg]\\ & =-\left(a^{3}+b^{3}\right)(a+b)+\frac{1}{2}\left(a^{2}+b^{2}\right)(a+b)^{2}\\ & =-(a+b)^{2}\left[a^{2}-ab+b^{2}-\frac{1}{2}a^{2}-\frac{1}{2}b^{2}\right]\\ & =-\frac{1}{2}(a+b)^{2}\left[a^{2}-2ab+b^{2}\right]\\ & =-\frac{1}{2}(a+b)^{2}(a-b)^{2} \end{align*}\]

Thus equation (\ref{eq:3}) becomes, \[\begin{align*} \frac{16}{(a-b)^{2}}\left[\frac{1}{2}(a+b)^{2}(a-b)^{2}-\frac{1}{2}(a+b)^{2}(a-b)^{2}\right] & =0 \end{align*}\]

Therefore $\operatorname{Kur}(X)=1$.

Additional Functions

gg_shape = function(data, models = 30, start = 15, end = 500, reverse = TRUE, 
                    ci = 0.95, auto.scale = TRUE, labels = TRUE) {
  data <- as.numeric(data)
  qq <- 0
  if (ci) 
    qq <- qnorm(1 - (1 - ci)/2)
  x <- trunc(seq(from = min(end, length(data)), to = start, 
                 length = models))
  
  gpd.dummy <- function(nex, data) {
    out <- gpd(data = data, nextremes = nex, information = "expected")
    c(out$threshold, out$par.ests[1], out$par.ses[1])
  }
  mat <- apply(as.matrix(x), 1, gpd.dummy, data = data)
  mat <- rbind(mat, x)
  dimnames(mat) <- list(c("threshold", "shape", "se", "exceedances"), 
                        NULL)
  thresh <- mat[1, ]
  y <- mat[2, ]
  index <- x
  if (reverse) 
    index <- -x
  df = data.frame("x" = index, "y" = y)
  labels = formatC(thresh[c(TRUE, FALSE, FALSE, FALSE, FALSE)], 
                   format = "e", digits = 2)
  breaks = index[c(TRUE, FALSE, FALSE, FALSE, FALSE)]
  ggplot(df, aes(x = x, y = y)) + 
    geom_line() + 
    scale_y_continuous(breaks = scales::pretty_breaks(n = 8)) + 
    scale_x_continuous(breaks = breaks,
                       labels = as.character(-breaks),
                       sec.axis = sec_axis(~ . * 1,
                       breaks = breaks, 
                       labels =  labels)) + 
    ylab("Shape (xi)") + 
    xlab("Exceedances") + 
    ggtitle("Threshold") + 
    theme(plot.title = element_text(hjust = 0.5, size = 10))
}


gg_tailplot = function(x, extend = 1.5) {
  data <- as.numeric(x$data)
  threshold <- x$threshold
  xi <- x$par.ests["xi"]
  beta <- x$par.ests["beta"]
  plotmin <- threshold
  if (extend <= 1) 
    stop("extend must be > 1")
  plotmax <- max(data) * extend
  xx <- seq(from = 0, to = 1, length = 1000)
  z <- qgpd(xx, xi, threshold, beta)
  z <- pmax(pmin(z, plotmax), plotmin)
  ypoints <- ppoints(sort(data))
  y <- pgpd(z, xi, threshold, beta)
  prob <- x$p.less.thresh
  ypoints <- (1 - prob) * (1 - ypoints)
  y <- (1 - prob) * (1 - y)
  df = data.frame("sortdata" = sort(data), "ypts" = ypoints)
  df2 = data.frame("x" = z[y >= 0] , "y" = y[y >= 0])
  ggplot(df, aes(x = sortdata, y = ypts)) +
    geom_point(shape = 1, size = 2) + 
    scale_x_log10() + 
    scale_y_log10() +  
    geom_line(data = df2, aes(x = x, y = y), inherit.aes = FALSE) + 
    ggtitle(paste0("Tail Plot of GPD for u = ", u)) + 
    xlab('x (on log scale)') + 
    ylab('1-F(x) (on log scale)') + 
    theme(plot.title = element_text(hjust = 0.5, size = 10))
}

STATS 509: Homework 3

Israel Diego

2/05/2020

Question 1

Question 2

Question 3

Additional Functions