STATS 551: Homework 3

ESS for importance sampling.

In class, we talked about the effective sample (ESS) size of an importance sampling with \(m\) samples defined as \[ E S S(m)=\frac{m}{1+c v[w]^{2}} \]

where \(c v[w]\) is equal to the sample standard deviation of the weights \(\left(w_{1}, \ldots, w_{m}\right)\) divided by the sample mean of the weights \(\left(w_{1}, \ldots, w_{m}\right)\)

Suppose \(\pi(x) \propto\) standard Gaussian density, \(g(x) \propto\) student t distribution with 2 degrees of freedom, \(h(x)=x .\) Implement an importance sampler and calculate \(E S S(m)\) for \(m=50,100,200,500,1000\)
Suppose \(g(x) \propto\) standard Gaussian density, \(\pi(x) \propto\) student t distribution with 2 degrees of freedom, \(h(x)=x .\) Implement an importance sampler and calculate \(E S S(m)\) for \(m=50,100,200,500,1000 .\) Is the \(\mathrm{g}(\mathrm{x})\) a sensible choice?
What do you find by comparing the results above?
Suppose the target density is \[ \pi\left(\mu, \sigma^{2}\right) \propto \sigma^{-5} \exp \left[-\frac{(\mu-1)^{4}+4}{2 \sigma^{2}}\right] \] where \(\left(\mu, \sigma^{2}\right) \in[-3,5] \times[0.01,5]\)
1. Make a contour plot of the target density in the specified range.
2. Given the target density and the contour plot, how can you choose a good importance function? Design an importance sampling procedure and estimate the ESS for several different sample sizes. Compare your samples with that obtained from grid sampling.
3. Optional (bonus points \(\leq 5\) ). If \(\mu\) is the only quantity of interest, does marginalization help with obtaining better importance samples? If so, can you verify it?

1 \(\mathbf{Solution.}\qquad\) Below we implement implement importance sampler and calculate ESS(m).

## QUESTION 1 ----------
m = c(50, 100, 200, 500, 1000)

t_importance = function(m) {
  rs = rt(m, 2)
  p = dnorm(rs)
  g = pt(rs,2)
  w = p / g
  ESS = m / (1 + var(w) / (mean(w))^2)
  ESS
}

ESS_t = data.frame(matrix(sapply(m, t_importance), nrow = 1))
colnames(ESS_t) = paste0("m=",m)
rownames(ESS_t) = c("ESS(m)")

kable(ESS_t, digits = 2)

	m=50	m=100	m=200	m=500	m=1000
ESS(m)	34.13	77.27	138.88	351.97	711.67

2 \(\mathbf{Solution.}\qquad\) Below we implement implement importance sampler and calculate ESS(m).

## QUESTION 2 ----------
norm_importance = function(m) {
  rs = rnorm(m)
  p = pt(rs, 2)
  g = dnorm(rs)
  w = p / g
  ESS = m / (1 + var(w) / (mean(w))^2)
  ESS
}

ESS_norm = data.frame(matrix(sapply(m, norm_importance), nrow = 1))

colnames(ESS_norm) = paste0("m=",m)
rownames(ESS_norm) = c("ESS(m)")

kable(ESS_norm, digits = 2)

	m=50	m=100	m=200	m=500	m=1000
ESS(m)	8.61	49.36	16.63	60.09	110.28

In question 2, we see that that the effective sample sizes don’t increase in \(m\) and they are quite small compared to the effective sample sizes of question 1. Thus \(g(x) \propto\) standard Gaussian density is not a sensible choice.

3 \(\mathbf{Solution.}\qquad\) We see that that the effective sample sizes are smaller when \(g(x) \propto\) standard Gaussian density compared to when \(g(x) \propto\) student t distribution with 2 degrees of freedom. Thus \(g(x) \propto\) standard Gaussian density is not a sensible choice.

4a \(\mathbf{Solution.}\qquad\) Below we create a contour plot of the target density

## QUESTION 4 ----------
# Part A --------

# Define Target Function
target = function(mu, sigma) {
  (sigma^(-5)) * exp(-((mu - 1)^4 + 4) / (2 * sigma^2))
}

# Prepare to compute grid of values for mu and sigma
n = 100
mu = seq(-3,5, length.out = n)
sigma = seq(0.01, 5, length.out = n)

Cmu = rep(mu, length(mu))
Csigma = rep(sigma, each = length(sigma))

z = mapply(target, Cmu, Csigma)
z_mat = matrix(z, nrow = n, ncol = n)
df = data.frame(mu = Cmu, sigma = Csigma, density = z)

ggplot(df, aes(mu, sigma, z = density)) + 
  geom_contour(bins = 10, colour = "steelblue") + 
  ggtitle("Contour Plot of Target Density") + 
  metR::geom_text_contour(aes(z = density))

4b \(\mathbf{Solution.}\qquad\) From the contour plot of part a) we can model it as a bivariate normal. First we will estimate the mean and variance parameters of the contour plot.

library(mvtnorm)

# First we estimate the parameters to be used in the importance function
set.seed(5)
# Estimate parameters to be used in multivariate normal
ind = which(z_mat == max(z_mat), arr.ind = TRUE)[1,]
samp1 = sample(mu, n, prob = z_mat[,ind[2]] / sum(z_mat[,ind[2]]), 
               replace = TRUE)

samp2 = sample(sigma, n, prob = z_mat[ind[1],] / sum(z_mat[ind[1],]),
               replace = TRUE)
samples = cbind(samp1, samp2)

# Estimated parameters of bivariate normal
means = colMeans(samples)
Sigma = diag(cov(samples))

means

##     samp1     samp2 
## 0.9943434 1.1929828

Sigma

##     samp1     samp2 
## 0.4923520 0.2941133

Next we will implement the importance sampling procedure using our estimated bivariate normal distribution. We will compare the effective sample sizes of this importance sampling procedure to grid sampling which draws uniformly from the grid of values.

mvnorm_importance = function(m) {
  rs = rmvnorm(m, mean = means, sigma = diag(Sigma))
  p = target(rs[,1], rs[,2])
  g = dmvnorm(rs, mean = means, sigma = diag(Sigma))
  w = p / g
  ESS = m / (1 + var(w) / (mean(w))^2)
  ESS
}

ESS_import = data.frame(matrix(sapply(m, mvnorm_importance), nrow = 1))
colnames(ESS_import) = paste0("m=",m)
rownames(ESS_import) = c("ESS BVN")

kable(ESS_import, digits = 3)

	m=50	m=100	m=200	m=500	m=1000
ESS BVN	17.466	0.602	123.257	363.632	596.39

# Lets compare to grid sampling procedure,
grid_unif = function(m) {
  rs = cbind(runif(m, 3, 5), runif(m, .01, 5))
  p = target(rs[,1], rs[,2])
  g = 1
  w = p / g
  ESS = m / (1 + var(w) / (mean(w))^2)
  ESS
}

ESS_grid = data.frame(matrix(sapply(m, grid_unif), nrow = 1))
colnames(ESS_grid) = paste0("m=",m)
rownames(ESS_grid) = c("ESS grid")

kable(ESS_grid, digits = 3)

	m=50	m=100	m=200	m=500	m=1000
ESS grid	10.25	16.912	32.491	76.368	158.379

We see that overall, the effective sample sizes are larger for the bivariate normal importance sampling procedure vs the grid sampling. Thus, choosing an importance function that is similar to the target distribution will help in increasing sample efficiency.

MCMC sampling for a hierarchical normal model.

Table 1 contains quality control measurements from 6 machines in a factory. Quality control measurements are expensive and time-consuming, so only 5 measurements were done for each machine. In addition to the existing machines, we are interested in the quality of another machine (the seventh machine). Let \(y_{i j}\) denote the \(j^{\text {th }}\) quality control measurement of machine \(i .\) Assume the following hierarchical Gaussian model with common variance for \(y=\left\{y_{i j}\right\}\) \[ \begin{aligned} P(\mu, \log \sigma, \log \tau) & \propto \tau \\ \theta_{j} | \mu, \tau^{2} & \sim N\left(\mu, \tau^{2}\right) \\ y_{i j} | \theta_{j}, \sigma^{2} & \sim N\left(\theta_{j}, \sigma^{2}\right), \quad i=1, \ldots, 5 ; j=1, \ldots, 6 \end{aligned} \]

Write down the joint distribution \(P\left(\left\{y_{i j}\right\},\left\{\theta_{j}\right\}, \sigma^{2}, \mu, \tau^{2}\right)\)
Gibbs Sampling
1. Write down the posterior conditionals required for Gibbs sampling. i.e. \[ P\left(\theta_{j} | \mu, \sigma, \tau, y\right), P\left(\mu |\left\{\theta_{j}\right\}, \sigma, \tau, y\right), P\left(\sigma^{2} |\left\{\theta_{j}\right\}, \mu, \tau, y\right), P\left(\tau^{2} |\left\{\theta_{j}\right\}, \mu, \sigma, y\right) \]
2. Implement a Gibbs sampler to sample from the posterior distribution \(P\left(\left\{\theta_{j}\right\}, \sigma^{2}, \mu, \tau^{2} | y\right) .\) Make sure you discard burn-in samples and your Gibbs sampler has converged by examining convergence diagnostics.
3. Compute the posterior expectation of the mean of the quality measurements of the sixth machine.
4. Suppose a new measurement is taken for the sixth machine. Give a \(95 \%\) posterior predictive interval for this new measurement.
5. Give a \(95 \%\) posterior interval for the mean of the quality control measurements of the seventh machine.
6. Is it reasonable to assume a model with common variance for this data? Why or why not?
Metropolis-Hasting algorithm
1. Compute the (unnormalized) posterior distribution of \(\mu, \log \sigma,\) and \(\log \tau\) by marginalizing out the \(\theta_{j}\) ’s in the joint distribution of the data and parameters.
2. Derive and implement a Metropolis algorithm to sample from the posterior in (a) with a normal proposal distribution for each parameter centered at the previous draw with variances \(\eta^{2}=0.01,0.1,1 .\) Comment on how the variance of the proposal distribution affects sampling efficiency.
3. Use your samples from (b) to obtain samples from the posterior distribution of each \(\theta_{j}, j=1, \ldots, 6 .\) Compare these samples with the posterior samples obtained using the Gibbs sampler in 2 .
Optional ( 5 bonus points). Extend the model in 1. by adding a hierarchical model for the variances of the machine quality measurements. That is, assume \[ \begin{aligned} P\left(\mu, \tau, \sigma_{0}\right) \propto 1 & \\ \sigma_{j}^{2} | \sigma_{0}^{2} & \sim \operatorname{Inv.} \chi^{2}\left(\nu, \sigma_{0}^{2}\right) \\ \theta_{j} | \mu, \tau^{2} & \sim N\left(\mu, \tau^{2}\right) \\ y_{i j} | \theta_{j}, \sigma^{2} & \sim N\left(\theta_{j}, \sigma^{2}\right), \quad i=1, \ldots, 6 ; j=1, \ldots, 5 \end{aligned} \] where Inv. \(\chi^{2}\left(\nu, \sigma_{0}^{2}\right)\) is the scaled inverse Chi-squared distribution with degrees of freedom \(\nu\) (known) and scale \(\sigma_{0}^{2}\) (unknown) and has density \[ p\left(x ; \nu, \sigma_{0}^{2}\right)=\frac{\left(\sigma_{0}^{2} \nu / 2\right)^{\nu / 2}}{\Gamma(\nu / 2)} \frac{\exp \left[\frac{-\nu \sigma_{0}^{2}}{2 x}\right]}{x^{1+\nu / 2}}, \quad x>0 \]
1. Assume \(\nu=5 .\) Repeat (a) - (e) in 1. Which posterior intervals are wider and why do you think this is the case?

1a \(\mathbf{Solution.}\qquad\) From page 288 of BDA, the joint distribution of all parameters is given by, \[ P\left(\left\{ y_{ij}\right\} ,\left\{ \theta_{j}\right\} ,\sigma^{2},\mu,\tau^{2}\right)=\tau\prod_{j=1}^{J}\mathrm{N}\left(\theta_{j}\mid\mu,\tau^{2}\right)\prod_{j=1}^{J}\prod_{i=1}^{n_{j}}\mathrm{N}\left(y_{ij}\mid\theta_{j},\sigma^{2}\right) \]

2a \(\mathbf{Solution.}\qquad\) From pages 289-290 of BDA, the posterior conditionals are, i). Conditional posterior distribution of each \(\theta_{j}\): \[\begin{align*} \theta_{j}\mid\mu,\sigma,\tau,y & \sim\mathrm{N}\left(\hat{\theta}_{j},V_{\theta_{j}}\right) \end{align*}\]

where, \[\begin{align*} \hat{\theta}_{j} & =\frac{\frac{1}{\tau^{2}}\mu+\frac{n_{j}}{\sigma^{2}}\bar{y}_{.j}}{\frac{1}{\tau^{2}}+\frac{n_{j}}{\sigma^{2}}}\\ V_{\theta_{j}} & =\frac{1}{\frac{1}{\tau^{2}}+\frac{n_{j}}{\sigma^{2}}} \end{align*}\]

ii). Conditional posterior distribution of \(\mu\): \[\begin{align*} \mu\mid\theta,\sigma,\tau,y & \sim \mathrm{N}\left(\hat{\mu},\tau^{2}/J\right) \end{align*}\]

where, \[ \hat{\mu}=\frac{1}{J}\sum_{j=1}^{J}\theta_{j} \]

iii). Conditional posterior distribution of \(\sigma^{2}\): \[\begin{align*} \sigma^{2}\mid\theta,\mu,\tau,y & \sim\mathrm{Inv-}\chi^{2}\left(n,\hat{\sigma}^{2}\right) \end{align*}\]

where, \[ \hat{\sigma}^{2}=\frac{1}{n}\sum_{j=1}^{J}\sum_{i=1}^{n_{j}}\left(y_{ij}-\theta_{j}\right)^{2} \]

iv). Conditional posterior distribution of \(\tau^{2}:\) \[\begin{align*} \tau^{2}|\theta,\mu,\sigma,y & \sim\textrm{Inv}-\chi^{2}\left(J-1,\hat{\tau}^{2}\right) \end{align*}\]

where, \[ \hat{\tau}^{2}=\frac{1}{J-1}\sum_{j=1}^{J}\left(\theta_{j}-\mu\right)^{2} \]

2b \(\mathbf{Solution.}\qquad\) Below we implement implement importance sampler and calculate ESS(m). We discard the first 1000 samples as our burn-in out of 10,000 iterations.

# 2. Gibbs Sampling -----------
# Part B ----------
J = 6
y = t(matrix(c(83, 92, 92, 46, 67,
             117, 109, 114, 104, 87,
             101, 93, 92, 86, 67,
             105, 119, 116, 102, 116,
             79, 97, 103, 79, 92,
             57, 92, 104, 77, 100), ncol= 6))
n = rep(ncol(y), nrow(y))
ybar = rowMeans(y)
s = apply(y, 1, sd)


# Gibbs update functions
theta_update <- function(mu, sigma, tau, J, n, ybar) {
  V_theta = 1 / (1 / tau^2 + n / sigma^2)
  theta_hat = V_theta * (mu / tau^2 + (n*ybar) / sigma^2)
  rnorm(J, mean = theta_hat, sd = sqrt(V_theta))
  }

mu_update <- function(theta, tau, J) {
  mu_hat = mean(theta)
  rnorm(1, mean = mu_hat, sd = tau / sqrt(J))
  }

sigma_update <- function(theta, ybar, s) {
  sigma2_hat = sum((n-1)*s^2 + n*(ybar - theta)^2) / sum(n)
  sigma2 = sum(n) * sigma2_hat / rchisq(1, df=sum(n))
  sqrt(sigma2)
  }

tau_update <- function(J, theta, mu) {
  tau2_hat = sum((theta - mu)^2) / (J - 1)
  tau2 = (J - 1) * tau2_hat / rchisq(1, df = J - 1)
  sqrt(tau2)
  }

# Single chain
iter = 10000

# Initialize values
theta_chain = matrix(0, iter, J)
mu_chain = rep(0, iter)
sigma_chain = rep(0, iter)
tau_chain = rep(0, iter);
theta = ybar
mu = mean(theta)
sigma = sqrt(mean(s^2))
tau = sd(ybar)

for (t in 1:iter) {
  # Update Parameters
  theta = theta_update(mu, sigma, tau, J, n, ybar)
  mu = mu_update(theta, tau, J)
  sigma = sigma_update(theta, ybar, s)
  tau = tau_update(J, theta, mu)
  
  # Store Estimates
  theta_chain[t,] = theta;
  mu_chain[t] = mu
  sigma_chain[t] = sigma
  tau_chain[t] = tau
  }

burnIn = 1000
sims = list(theta_chain=theta_chain[burnIn:iter,], 
            mu_chain=mu_chain[burnIn:iter], 
            sigma_chain=sigma_chain[burnIn:iter], 
            tau_chain=tau_chain[burnIn:iter])

After sampling, we create plots for gibbs samples of each \(theta_j\) after discarding.

# Create 3x3 plot gibbs samples for theta_j
plot_list = list()
df = data.frame(x = burnIn:iter, sims$theta_chain)
colnames(df) = c("x", paste0("theta", 1:J))

for (i in 1:J) {
  # Plot first 200 iterations
  plot_list[[i]] = local({
    p1 = ggplot(data = df, aes(x = x, y = get(paste0("theta", i)))) +
      geom_line(colour = "steelblue") + 
      ylab(TeX(paste0("$\\theta_",i,"$"))) + 
      ggtitle(TeX(paste0("gibbs samples for ", "$\\theta_", i, "$"))) + 
      theme(plot.title = element_text(hjust = 0.5))
  })
}

# plot all subplots
grid.arrange(grobs = plot_list, ncol = 2,
             top = textGrob(""),
             gp = gpar(fontsize = 13, font = 8))

From the plots of each \(\theta_j\) above, it looks like Gibbs sampler has converged.

2c \(\mathbf{Solution.}\qquad\) The posterior expectation of the sixth machine will be the mean of \(\theta_6\) that we sampled. We print this mean below.

# Part C ----------
machine_6_mean = mean(theta_chain[,6])
machine_6_mean

## [1] 87.5313

2d \(\mathbf{Solution.}\qquad\) Below we find the 95% posterior predictive interval of a new measurement for machine 6.

# Part D ----------
probs = c(.025, .975)

trial_6 = rnorm(iter, mean = sims$theta_chain[,6], sd = sims$sigma_chain)
res = quantile(trial_6, probs=probs)
res

##      2.5%     97.5% 
##  55.84678 119.87343

2e \(\mathbf{Solution.}\qquad\) Below we compute 95% posterior interval for the mean quality of measurements of the seventh machine.

# Part E ----------
theta_7 = rnorm(iter, mean = sims$mu_chain, sd = sims$tau_chain)
trial_7 = rnorm(iter, mean = theta_7, sd = sims$sigma_chain)
res = quantile(trial_7, probs=probs)
res

##      2.5%     97.5% 
##  42.01938 141.17807

# Store samples for Theta
theta_Gibbs = data.frame(theta_chain)
colnames(theta_Gibbs) = paste0("theta", 1:J)

3a \(\mathbf{Solution.}\qquad\) From page 328 of BDA, the unnormalized posterior distribution is,

\[p(\mu, \log \sigma, \log \tau | y) \propto \tau \prod_{j=1}^{J} \mathrm{N}\left(\hat{\theta}_{j} | \mu, \tau^{2}\right) \prod_{j=1}^{J} \prod_{i=1}^{n_{j}} \mathrm{N}\left(y_{i j} | \hat{\theta}_{j}, \sigma^{2}\right) \prod_{j=1}^{J} V_{\theta_{j}}^{1 / 2}\]

3b \(\mathbf{Solution.}\qquad\) Below we define the following functions to be used in the Metropolis algorithm. The code is an adaptation of the sample code from page 600-601 in BDA.

# 3. Metropolis-Hasting algorithm ----------
# Part B ----------

log_post <- function(tau, mu, sigma, n, y, ybar){
  V_theta = 1 / (1 / tau^2 + n / sigma^2)
  theta = V_theta * (mu / tau^2 + (n*ybar) / sigma^2)
  log_post = log(tau) + sum(dnorm(theta, mu, sigma, log = TRUE)) + 
    sum(dnorm(y, theta, sigma, log = TRUE)) + 0.5 * sum(V_theta)
  log_post
}

MH_update <- function(tau, mu, sigma, eta, n, y, ybar){
  accept_MH = 0
  tau_star = rnorm(1, tau, eta)
  mu_star = rnorm(1, mu, eta)
  sigma_star = rnorm(1, sigma, eta)
  if (tau_star <= 0 | sigma_star <= 0)
  {
    return(list(tau=tau, mu=mu, sigma = sigma, accept_MH = accept_MH))
  }
  
  
  r = exp(log_post(tau_star, mu_star, sigma_star, n, y, ybar) - 
             log_post(tau, mu, sigma, n, y, ybar))
  
  if (runif(1) < min(r, 1)){ 
    tau = tau_star
    mu = mu_star
    sigma = sigma_star
    accept_MH = 1
    }
  
  return(list(tau=tau, mu=mu, sigma = sigma, accept_MH = accept_MH))
}

build_MH_Chain = function(eta) {
  # Initialize Lists
  mu_chain = rep(0, iter)
  sigma_chain = rep(0, iter)
  tau_chain = rep(0, iter);
  accept_chain = rep(0, iter)
  
  # Initial Parameters
  theta = ybar
  mu = mean(theta)
  sigma = sqrt(mean(s^2))
  tau = sd(ybar)
  
  for(t in 1:iter) {
    res = MH_update(tau, mu, sigma, eta, n, y, ybar)
    tau = res$tau 
    mu = res$mu
    sigma = res$sigma
    tau_chain[t] = tau
    mu_chain[t] = mu
    sigma_chain[t] = sigma
    accept_chain[t] = res$accept_MH
  }
  
  list(mu_chain = mu_chain, sigma_chain = sigma_chain, 
       tau_chain = tau_chain, accept_chain = accept_chain)
}

Now we implement Metropolis algorithm for each value of \(\eta\). We can see from the table below that \({\eta}^2 = 0.01\) produces the highest acceptance rate, while \({\eta}^2 = 0.1\) and \({\eta}^2 = 1\) produces the lowest acceptance rates. Thus, either \({\eta}^2 = 0.1\) or \({\eta}^2 = 1\) will be more efficient.

# Implement Metropolis Algorithm for each eta
eta_vals = sqrt(c(0.01, 0.1, 1))
res = lapply(eta_vals, build_MH_Chain)

# Compute ESS of each parameter
df = data.frame(matrix(NA, nrow = 1, ncol = 3))
colnames(df) = paste0("eta^2=", c(0.01, 0.1, 1))

for (i in 1:length(res)) {
  chains = res[[i]]
  accept_rate = sum(chains$accept_chain) / iter
  df[1,i] = accept_rate
}
rownames(df) = "Accept rate"

kable(df, digits = 5)

	eta^2=0.01	eta^2=0.1	eta^2=1
Accept rate	0.5484	0.5059	0.5064

3c \(\mathbf{Solution.}\qquad\) The first two rows of output represent the means of MH algorithm and means of Gibbs Sampler. We can see that they are both very similar. The next two rows represent the standard deviations of each estimate of \(\theta_j\), which are also very similar for both sampling algorithms. The last two rows compares the effective size of each sample of \(\theta_j\) and we find that the effective sizes for the MH algorithm are larger overall than the effective sizes from the Gibbs Sampler.

# Part C ----------
theta0 = ybar
theta_MH = mapply(theta_update, mu_chain, sigma_chain, tau_chain, 
            MoreArgs = list(J, n, ybar))
theta_MH = data.frame(t(theta_MH))
colnames(theta_MH) = paste0("theta", 1:J)

# Compare Means and Variances
colMeans(theta_MH)

##    theta1    theta2    theta3    theta4    theta5    theta6 
##  79.74332 103.24473  88.93060 107.29052  90.71297  87.54264

colMeans(theta_Gibbs)

##    theta1    theta2    theta3    theta4    theta5    theta6 
##  79.81077 103.24471  88.89138 107.32645  90.55743  87.53130

apply(theta_MH, 2, sd)

##   theta1   theta2   theta3   theta4   theta5   theta6 
## 6.724189 6.470693 6.053743 6.829975 6.100632 6.105462

apply(theta_Gibbs, 2, sd)

##   theta1   theta2   theta3   theta4   theta5   theta6 
## 6.603069 6.558095 6.100432 6.874321 6.074387 6.161556

# Compute ESS
effectiveSize(theta_MH)

##   theta1   theta2   theta3   theta4   theta5   theta6 
## 4889.657 6286.884 9442.918 4950.360 9594.773 8250.623

effectiveSize(theta_Gibbs)

##   theta1   theta2   theta3   theta4   theta5   theta6 
## 3920.079 5110.971 7549.588 3760.843 8454.377 6786.224

STATS 551: Homework 3

Israel Diego

3/11/2020

ESS for importance sampling.

MCMC sampling for a hierarchical normal model.