In this exercise, we are going to re-implement HW3 Question 3 & 4 using STAN. Given the following data from HW3:
Part I: As previously, table \([1]\) contains quality control measurements from \(6\) machines in a factory. Let \(y_{i j}\) denote the \(j^{\text {th }}\) quality control measurement of machine \(i .\) Assume the following hierarchical Gaussian model with common variance for \(y=\left\{y_{i j}\right\}\) \(p(\mu, \log \sigma, \log \tau) \propto \tau\) \[ \begin{array}{l} \theta_{j} | \mu, \tau^{2} \sim N\left(\mu, \tau^{2}\right) \\ y_{i j} | \theta_{j}, \sigma^{2} \sim N\left(\theta_{j}, \sigma^{2}\right), \quad i=1, \ldots, 5 ; j=1, \ldots, 6 \end{array} \]
q1measurep1.stan
”.q1runp1.R
”(a) \(\mathbf{Solution.}\qquad\) See q1measurep1.stan
.
(b) \(\mathbf{Solution.}\qquad\) Below we print the results of our model fit.
## Inference for Stan model: q1measurep1.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 97.5% n_eff Rhat
## theta[1] 78.91 0.14 6.65 66.02 92.35 2425 1
## theta[2] 104.18 0.13 6.64 91.32 117.52 2589 1
## theta[3] 88.49 0.11 6.18 76.44 100.36 3067 1
## theta[4] 108.53 0.14 6.85 94.45 121.42 2401 1
## theta[5] 90.38 0.10 6.22 78.11 102.61 3615 1
## theta[6] 86.97 0.11 6.43 74.44 99.46 3481 1
## sigma 15.14 0.05 2.39 11.28 20.48 2827 1
## tau 21.26 0.38 13.82 6.27 58.61 1337 1
## mu 93.09 0.25 10.16 71.09 114.23 1696 1
## lp__ -106.54 0.07 2.44 -112.27 -102.96 1394 1
##
## Samples were drawn using NUTS(diag_e) at Sat Mar 28 18:53:57 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
We also show the trace plots of each of the fitted paramaters.
From the table and trace plots, we see that \(\tau\) has the smallest effective sample size and there is a lot of volatility for the convergence of the chains. All other parameters look to be estimated well.
(c) \(\mathbf{Solution.}\qquad\) Below we provide posterior histogram plots for all model parameters.
(d) \(\mathbf{Solution.}\qquad\) Below we show the lower and upper posterior intervals for each \(\theta_j\) and compare to results from HW3. From the tables below, we see that the intervals are close to each other.
## # A tibble: 3 x 7
## Algorithm theta1_lo theta2_lo theta3_lo theta4_lo theta5_lo theta6_lo
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Gibbs 66.8 91.2 76.8 93.8 78.5 75.5
## 2 MH 66.7 90.3 76.8 93.3 78.7 76.0
## 3 Stan Part I 66.0 91.3 76.4 94.5 78.1 74.4
## # A tibble: 3 x 7
## Algorithm theta1_up theta2_up theta3_up theta4_up theta5_up theta6_up
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Gibbs 93.1 116. 101. 121. 103. 99.7
## 2 MH 93.7 116. 101. 121. 102. 99.2
## 3 Stan Part I 92.3 118. 100. 121. 103. 99.5
By graphically comparing our samples to the samples in HW3, we see that indeed they are similar.
Part II: In this part, the model is as follows:
\[
\begin{array}{l}
p\left(\mu, \tau, \sigma_{0}\right) \propto 1 \\
\sigma_{j}^{2} | \sigma_{0}^{2} \sim \operatorname{Inv} . \chi^{2}\left(5, \sigma_{0}^{2}\right) \\
\theta_{j} | \mu, \tau^{2} \sim N\left(\mu, \tau^{2}\right) \\
y_{i j} | \theta_{j}, \sigma^{2} \sim N\left(\theta_{j}, \sigma^{2}\right), \quad i=1, \ldots, 6 ; j=1, \ldots, 5
\end{array}
\]
- Repeat part I (a)-(d) and name it as “q1measurep2.stan
” and “q1runp2.R
” respectively. Compare with HW3 question 4 results.
(a) \(\mathbf{Solution.}\qquad\) See q1measurep2.stan
.
(b) \(\mathbf{Solution.}\qquad\) Below we print the results of our model fit.
## Inference for Stan model: q1measurep2.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 25% 75% n_eff Rhat
## mu 93.75 0.16 7.75 89.11 98.22 2227 1
## tau 16.04 0.22 8.85 10.34 19.63 1649 1
## theta[1] 81.21 0.18 8.06 75.89 86.43 2083 1
## theta[2] 103.60 0.15 6.38 99.63 107.79 1827 1
## theta[3] 89.23 0.12 6.05 85.42 92.98 2501 1
## theta[4] 108.11 0.17 6.44 104.47 112.33 1523 1
## theta[5] 91.02 0.11 5.68 87.62 94.63 2544 1
## theta[6] 88.02 0.14 7.16 83.36 92.76 2638 1
## sigma2[1] 379.91 5.55 244.10 221.85 460.49 1933 1
## sigma2[2] 228.60 3.88 167.71 130.55 272.11 1870 1
## sigma2[3] 227.64 3.70 148.50 135.14 274.29 1609 1
## sigma2[4] 193.23 3.79 158.79 100.75 229.35 1757 1
## sigma2[5] 207.88 3.40 144.55 117.49 249.16 1802 1
## sigma2[6] 345.68 4.47 217.30 211.20 418.97 2361 1
## sigma0 14.03 0.08 2.91 11.92 15.77 1421 1
## sigma[1] 18.74 0.12 5.35 14.89 21.46 1959 1
## sigma[2] 14.44 0.10 4.49 11.43 16.50 1892 1
## sigma[3] 14.51 0.10 4.14 11.63 16.56 1760 1
## sigma[4] 13.14 0.11 4.55 10.04 15.14 1676 1
## sigma[5] 13.80 0.10 4.17 10.84 15.78 1872 1
## sigma[6] 17.93 0.10 4.92 14.53 20.47 2606 1
## lp__ -98.52 0.09 3.10 -100.32 -96.30 1097 1
##
## Samples were drawn using NUTS(diag_e) at Sat Mar 28 16:18:12 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
We also show the trace plots of each of the fitted paramaters.
From the table and trace plots, we see that all chains are convergent.
(c) \(\mathbf{Solution.}\qquad\) Below we provide posterior histogram plots for all model parameters.
(d) \(\mathbf{Solution.}\qquad\) Below we show the lower and upper posterior intervals for each \(\theta_j\) and compare to results from HW3. From the tables below, we see that the intervals are close to each other.
## # A tibble: 3 x 7
## Algorithm theta1_lo theta2_lo theta3_lo theta4_lo theta5_lo theta6_lo
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Gibbs 66.8 91.2 76.8 93.8 78.5 75.5
## 2 MH 66.7 90.3 76.8 93.3 78.7 76.0
## 3 Stan Part II 66.0 90.3 77.5 93.9 79.6 73.5
## # A tibble: 3 x 7
## Algorithm theta1_up theta2_up theta3_up theta4_up theta5_up theta6_up
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Gibbs 93.1 116. 101. 121. 103. 99.7
## 2 MH 93.7 116. 101. 121. 102. 99.2
## 3 Stan Part II 97.4 116. 101. 120. 103. 102.
By graphically comparing our samples to the samples in HW3, we see that indeed they are similar.
Recall from the lecture that STAN doesn’t allow any discrete model parameters. In some cases, we can work around the restriction by marginalizing these out. In this question, we are going to explore one such example. One useful model for clustering is the latent Gaussian mixture model: \[ \begin{array}{l} \text { For } x_{i}, \mu_{k} \in \mathbb{R}^{2}, \Sigma_{k} \in \mathbb{R}^{2 \times 2}, \pi \in[0,1]^{3}, i=1, \ldots, 1000 \text { and } k=1,2,3 \\ \qquad \begin{aligned} z_{i} | \pi & \sim \text { Categorial }(z | \pi) \\ & x_{i}\left|z_{i}=k, \mu, \Sigma \sim N\left(x | \mu_{k}, \Sigma_{k}\right)\right.\\ \pi \sim \text { Dirichlet }\left(\pi | \frac{1}{3}, \frac{1}{3}, \frac{1}{3}\right) & \\ \mu_{k}\left|\Sigma_{k} \sim N\left(\mu | 0, \Sigma_{k}\right)\right. \end{aligned} \end{array} \] \(\Sigma_{k} \sim\) Inv-Wishart \((\Sigma | 3, I)\) where \(I_{2 \times 2}:\) identity matrix
gmm.stan
”. Hint: Use STAN log_sum_exphw4q2.csv
” in Canvas. The data is the collection of \((x, y)\) coordinates for each data point. Report the effective sample sizes, trace plots and R hat.hw4q2.csv
” in \(2 \mathrm{D} .\) Are the results from part (d) and (e) in line with the data?q2run.R
”.(a) \(\mathbf{Solution.}\qquad\) We define each \(p\left(x_{i} \mid \mu, \Sigma, \pi\right)\) as,
\[p\left(x_{i} \mid \mu, \Sigma, \pi\right)=\sum_{k=1}^{3} \pi_{k} p\left(x_{i} \mid z_{i}=k\right)\]
(b)-(e) \(\mathbf{Solution.}\qquad\) The markov chains won’t converge due to label switching. We can fix the issue if we sort the parameters we are estimating, as can be seen in gmm.stan
file. Below we print the effective sample sizes and R hat of each parameter.
## n_eff Rhat
## mu[1,1] 752.9944 0.9992802
## mu[1,2] 1126.9094 1.0006393
## mu[2,1] 725.5882 0.9997860
## mu[2,2] 746.5764 0.9997889
## mu[3,1] 1147.5294 0.9999796
## mu[3,2] 2645.3171 0.9996144
## sigma[1,1,1] 1584.1790 1.0006166
## sigma[1,1,2] 1096.2835 1.0004774
## sigma[1,2,1] 1096.2835 1.0004774
## sigma[1,2,2] 1562.2531 1.0012742
## sigma[2,1,1] 901.2852 0.9991153
## sigma[2,1,2] 1359.8538 1.0008705
## sigma[2,2,1] 1359.8538 1.0008705
## sigma[2,2,2] 878.3457 1.0018428
## sigma[3,1,1] 1589.2433 0.9993557
## sigma[3,1,2] 1737.7172 0.9995815
## sigma[3,2,1] 1737.7172 0.9995815
## sigma[3,2,2] 1671.1224 1.0002212
## Pi[1] 661.3025 0.9998311
## Pi[2] 608.9885 0.9995605
## Pi[3] 651.3841 0.9992514
## P[1] 1018.8506 1.0039840
## P[2] 1050.5729 1.0034844
## P[3] 959.2461 1.0039818
## lp__ 765.6541 0.9994990
We also show the trace plots.
(f) \(\mathbf{Solution.}\qquad\) Below we provide posterior histogram plots for all model parameters.
(g) \(\mathbf{Solution.}\qquad\) Below we plot the posterior 90% intervals of each parametr.
(h) \(\mathbf{Solution.}\qquad\) The results from parts (f) and (g) are almost in line with the data. From the data we see that there appears to be three separate clusters. However, our class mean and covariance for the third class seems to actually be fitting between two smaller clusters in the data.
During an election, the voter turnout is an important political issue to investigate in details. For this question, download the “hw4q3.csv” from the Canvas. This file is the turnout data from R Zelig package. The survey data has 6 columns: - age: participant’s age - educate: participant’s education level - income: participant’s income - racewhite: 1 if raceothers \(==0\) and 0 otherwise - raceothers 1 if racewhite \(==0\) and 0 otherwise - vote: 1 if voted and 0 otherwise
We will apply the Bayesian Logit and Probit regression models where “vote” is the response \(y \in\{0,1\}^{N}\) and the others are covariates. N is the number of data points and \(X \in \mathbb{R}^{N \times 5}\) is the design matrix.