Dirichlet-Multinomial model (4 \(\times\) 10 points).

  1. The Multinomial model with a Dirichlet prior is a generalization of the Bernoulli model and Beta prior. The Dirichlet distribution for \(K\) outcomes is the exponential family distribution on the \(K-1\) dimensional probability simplex given by, \[ \pi_{\boldsymbol{\alpha}}(\boldsymbol{\theta})=\frac{\Gamma\left(\sum_{k=1}^{K} \alpha_{j}\right)}{\prod_{j=1}^{K} \Gamma\left(\alpha_{j}\right)} \prod_{j=1}^{K} \theta_{j}^{\alpha_{j-1}} \] where \(\boldsymbol{\alpha}=\left(\alpha_{1}, \ldots, \alpha_{K}\right)^{T} \in \mathbb{R}_{+}^{K}\) is a non-negative vector of scaling coefficients, which are known parameters. The multinomial distribution, denoted as Multinomial \((n, \boldsymbol{\theta}),\) is a discrete distribution over \(K\) dimensional non-negative integer vectors \(\mathbf{X} \in \mathbb{Z}_{+}^{K}\) and \(\sum_{j=1}^{K} X_{j}=n .\) Assuming this Dirichlet-Multinomial model, what is the posterior distribution of \(\boldsymbol{\theta} ?\) What is the posterior mean?

\(\mathbf{Solution.}\qquad\) Given that that \(\pi_{\boldsymbol{\alpha}}\left(\boldsymbol{\theta}\right)\sim Dirichlet\left(\boldsymbol{\alpha}\right)\) and the data \(\mathbf{X}\sim Multinomial\left(n,\boldsymbol{\theta}\right)\) , then the posterior distribution of \(\theta\) is given by, \[\begin{align*} p\left(\boldsymbol{\theta}\mid x_{1:K}\right) & \propto\pi_{\boldsymbol{\alpha}}\left(\boldsymbol{\theta}\right)p\left(x_{1:K}\mid\boldsymbol{\theta}\right)\\ & \propto\prod_{j=1}^{K}\theta_{j}^{\alpha_{j}-1}\prod_{j=1}^{K}\theta_{j}^{x_{j}}\\ & =\prod_{j=1}^{K}\theta_{j}^{\alpha_{j}-1+x_{j}} \end{align*}\]

Thus \(p\left(\boldsymbol{\theta}\mid x_{1:K}\right)\sim Dirichlet\left(\alpha_{j}+x_{j}\right)\text{ for }j=1,\dots,K\). The posterior mean of each \(\theta_{j}\) is, \[ \mathbb{E}\left[\theta_{j}\mid x_{1:K}\right]=\frac{\alpha_{j}+x_{j}}{\sum_{j=1}^{K}\left(\alpha_{j}+x_{j}\right)} \]

   

  1. Derive the posterior predictive density function of a future observation of \(\mathbf{X}\) conditioned on the current observations.

\(\mathbf{Solution.}\qquad\)

\[\begin{align*} p\left(\tilde{x}_{1:K}\mid x_{1:K}\right) & =\int p\left(\tilde{x}_{1:K}\mid\theta\right)p\left(\theta\mid x_{1:K}\right)d\theta\\ & =\int\frac{n!}{\prod_{j=1}^{K}\tilde{x}_{j}!}\prod_{j=1}^{K}\theta_{j}^{\tilde{x}_{j}}\frac{\Gamma\left(\sum_{j=1}^{K}\alpha_{j}+x_{j}\right)}{\prod_{j=1}^{K}\Gamma\left(\alpha_{j}+x_{j}\right)}\prod_{j=1}^{K}\theta_{j}^{\alpha_{j}+x_{j}-1}d\theta\\ & =\frac{\Gamma(n+1)}{\prod_{j=1}^{K}\Gamma\left(\tilde{x}_{j}+1\right)}\frac{\Gamma\left(\sum_{j=1}^{K}\alpha_{j}+x_{j}\right)}{\prod_{j=1}^{K}\Gamma\left(\alpha_{j}+x_{j}\right)}\int\prod_{j=1}^{K}\theta_{j}^{\tilde{x}_{j}+\alpha_{j}-1+x_{j}}d\theta \end{align*}\]

Focusing on the last integral, we can write it is, \[\begin{align*} \int\prod_{j=1}^{K}\theta_{j}^{\tilde{\alpha}_{j}-1}d\theta & = \int\theta_{1}^{\tilde{\alpha}_{1}-1}\theta_{2}^{\tilde{\alpha}_{2}-1}\cdot\cdot\cdot\theta_{K}^{\tilde{\alpha}_{K}-1}d\theta \end{align*}\]

where we let \(\tilde{\alpha}_{j}=\tilde{x}_{j}+\alpha_{j}+x_{j}\). Next we substitute \(\theta_{K}=1-\sum_{j=1}^{K-1}\theta_{j}\) and integrate each variable separately, \[\begin{align*} & =\int_{0}^{1}\theta_{1}^{\widetilde{\alpha}_{1}-1}d\theta_{1}\int_{0}^{1-\theta_{1}}\theta_{2}^{\tilde{\alpha}_{2}-1}d\theta_{2}\cdots\int_{0}^{1-\sum_{j=1}^{K-2}\theta_{j}}\theta_{K-1}^{\tilde{\alpha}_{K-1}-1}\left(1-\sum_{j=1}^{K-1}\theta_{j}\right)^{\tilde{\alpha}_{K-1}}d\theta_{k-1}\\ & =\mathrm{B}\left(\tilde{\alpha}_{K},\tilde{\alpha}_{K-1}\right)\prod_{j=1}^{K-2}\mathrm{B}\left(\tilde{\alpha}_{j},\tilde{\alpha}_{K}+\sum_{i=j+1}^{K-1}\tilde{\alpha}_{i}\right)\\ & =\frac{\prod_{j=1}^{K}\Gamma\left(\tilde{\alpha}_{j}\right)}{\Gamma\left(\sum_{j=1}^{K}\tilde{\alpha}_{j}\right)} \end{align*}\]

Substituting this integral back, we have, \[ p\left(\tilde{x}_{1:K}\mid x_{1:K}\right)=\frac{\Gamma(n+1)}{\prod_{j=1}^{K}\Gamma\left(\tilde{x}_{j}+1\right)}\frac{\Gamma\left(\sum_{j=1}^{K}\alpha_{j}+x_{j}\right)}{\prod_{j=1}^{K}\Gamma\left(\alpha_{j}+x_{j}\right)}\frac{\prod_{j=1}^{K}\Gamma\left(\tilde{x}_{j}+\alpha_{j}+x_{j}\right)}{\Gamma\left(2n+\sum_{j=1}^{K}\alpha\right)} \]

   

  1. Define \(Y=\frac{\theta_{1}}{\theta_{1}+\theta_{2}} .\) Write down the marginal posterior distribution for \(Y .\) Show that this distribution is identical to the posterior distribution for \(Y\) obtained by treating \(X_{1}\) as an observation from the binomial distribution with probability \(Y\) and sample size \(X_{1}+X_{2},\) ignoring the data \(X_{3}, \ldots, X_{K}\)

\(\mathbf{Solution.}\qquad\) First, we solve for the marginal posterior distribution. From 1 we know the posterior distribution \(p\left(\boldsymbol{\theta}\mid x_{1:k}\right)\propto\prod_{j=1}^{k}\theta_{j}^{\alpha_{j}-1+x_{j}},\) then the marginal distribution of parameters \(\theta_{1},\theta_{2},1-\theta_{1}-\theta_{2}\), can be written as, \[ p\left(\theta_{1},\theta_{2}\mid x_{1:k}\right)\propto\theta_{1}^{\alpha_{1}-1+x_{1}}\theta_{2}^{\alpha_{2}-1+x_{2}}\left(1-\theta_{1}-\theta_{2}\right)^{-1+\sum_{j=3}^{K}\alpha_{j}+x_{j}} \]

Define \(Y:=\frac{\theta_{1}}{\theta_{1}+\theta_{2}}\text{ and }Z:=\theta_{1}+\theta_{2}\). Then we can find, \[ p\left(Y,Z\mid x_{1:K}\right)=p\left(\theta_{1}\left(Y,Z\right),\theta_{2}\left(Y,Z\right)\mid x_{1:K}\right)\left|\frac{\partial\left(\theta_{1},\theta_{2}\right)}{\partial(Y,Z)}\right| \]

We rewrite \(\theta_{1},\theta_{2}\) in terms of \(Y,Z\). \[ \begin{cases} Y=\frac{\theta_{1}}{\theta_{1}+\theta_{2}} & \Longrightarrow\theta_{2}=Z-ZY\\ Z:=\theta_{1}+\theta_{2} & \Longrightarrow\theta_{1}=ZY \end{cases} \]

Then the determinant of the Jacobian matrix is, \[ \left|\frac{\partial\left(\theta,\theta_{2}\right)}{\partial(Y,Z)}\right|=\mathrm{det}\left[\begin{array}{cc} Z & Y\\ -Z & 1-Y \end{array}\right]=Z \]

We can write down the joint posterior distribution of \(Y,Z\), \[\begin{align*} & p\left(Y,Z\mid X_{1:k}\right)\\ & \propto Z(ZY)^{\alpha_{1}-1+x_{1}}\left(Z\left(1-Y\right)\right)^{\alpha_{2}-1+x_{x}}\left(1-Z\right)^{-1+\sum_{j=3}^{K}\alpha_{j}+x_{j}}\\ & =Z^{\alpha_{1}+x_{1}+\alpha_{2}+x_{2}-1}\left(1-Z\right)^{-1+\sum_{j=3}^{K}\alpha_{j}+x_{j}}Y^{\alpha_{1}-1+x_{1}}\left(1-Y\right)^{\alpha_{2}-1+x_{2}}\\ & \propto\mathrm{Beta}\left(Y;\alpha_{1}+x_{1},\alpha_{2}+x_{2}\right)\mathrm{Beta}\left(Z;\alpha_{1}+\alpha_{2}+x_{1}+x_{2},\sum_{j=3}^{K}\alpha_{j}+x_{j}\right) \end{align*}\]

We see that the joint posterior distribution of \(Y,Z\) factorizes into the product of marginal posterior distributions of \(Y,Z\), which are both Beta. Thus the marginal posterior distribution of \(Y\) is, \[ Y\mid x_{1:K}\sim\mathrm{Beta}\left(\alpha_{1}+x_{1},\alpha_{2}+x_{2}\right) \]

Next is to show this is identical to the following scenario. Suppose that \(Y\sim\mathrm{Beta}\left(\alpha_{1},\alpha_{2}\right)\) and \(x_{1}\sim Bin\left(x_{1}+x_{2},Y\right)\). Then the posterior distribution of \(Y\) is, \[\begin{align*} p\left(Y\mid x_{1}\right) & \propto p\left(Y\right)p\left(x_{1}\mid Y\right)\\ & \propto Y^{\alpha_{1}-1}\left(1-Y\right)^{\alpha_{2}-1}\left(Y\right)^{x_{1}}\left(1-Y\right)^{x_{2}}\\ & =Y^{\alpha_{1}+x_{1}-1}\left(1-Y\right)^{\alpha_{2}+x_{2}-1} \end{align*}\]

which corresponds to \(\mathrm{Beta}\left(\alpha_{1}+x_{1},\alpha_{2}+x_{2}\right)\).

   

  1. On September 25,1988, the evening of a presidential campaign debate, ABC News conducted a survey of registered voters in the United States; 639 persons were polled before the debate, and 639 different persons were polled after. The results are displayed in Figure 1. Assume the surveys are independent simple random samples from the population of registered voters. Model the data with two different multinomial distributions. For \(j=1,2,\) let \(\theta_{j}\) be the proportion of voters who preferred Bush, out of those who had a preference for either Bush or Dukakis at the time of survey \(j\). Plot a histogram of the posterior density for \(\theta_{2}-\theta_{1}\). What is the posterior probability that there was a shift toward Bush? \[\begin{array}{l|ccc|c}{\text { Survey }} & {\text { Bush }} & {\text { Dukakis }} & {\text { No opinion /other }} & {\text { Total }} \\ \hline \text { pre-debate } & {294} & {307} & {38} & {639} \\ {\text { post-debate }} & {288} & {332} & {19} & {639}\end{array}\]

Figure 1: Number of respondents in each preference category from ABC News pre- and post dobet of post-debate surveys in 1988.

\(\mathbf{Solution.}\qquad\) Suppose we model pre-debate and post-debate number of votes per candidate with multinomial distributions, each \(\boldsymbol{\alpha}\sim Dirichlet\left(1,1,1\right)\) and \(\boldsymbol{\phi}\sim Dirichlet\left(1,1,1\right)\) respectively. Then the posterior distributions of each multinomial model are, \[ \boldsymbol{\alpha}\mid x\sim Dirichlet\left(295,308,39\right),\qquad\boldsymbol{\phi}\mid y\sim Dirichlet\left(289,333,20\right) \]

where \(x,y\) are data from pre-debate and post-debate. From question 3, we can focus on the proportion between Bush and Dukakis and define, \[ \theta_{1}=\frac{\alpha_{1}}{\alpha_{1}+\alpha_{2}},\qquad\theta_{2}=\frac{\phi_{1}}{\phi_{1}+\phi_{2}} \]

and we know the posterior distributions of each proportion are, \[ \theta_{1}\mid x\sim\mathrm{Beta}\left(295,308\right),\qquad\theta_{2}\mid y\sim\mathrm{Beta}\left(289,333\right) \]

Finally we can sample draws of \(\theta_{1}\) and \(\theta_{2}\). We draw one million times from each beta distribution and compute their difference.

Thus, we estimate that the probability of a shift toward Bush is around 0.195.

Analysis of proportions \((6 \times 10\) points).

A survey was done of bicycle and other vehicular traffic in the neighborhood of the campus of the University of California, Berkeley, in the spring of \(1993 .\) Sixty city blocks were selected at random; each block was observed for one hour, and the numbers of bicycles and other vehicles traveling along that block were recorded. The sampling was stratified into six types of city blocks: busy, fairly busy, and residential streets, with and without bike routes, with ten blocks measured in each stratum. Figure 2 displays the number of bicycles and other vehicles recorded in the study. For this problem, restrict your attention to the first two rows of the table: residential streets labeled as ‘bike routes’, which we will use to illustrate this computational exercise.

Figure 2: Counts of bicycles and other vehicles in one hour in each of 10 city blocks in each of six categories. (The data for two of the residential blocks were lost. For example, the first block had 16 bicycles and 58 other vehicles, the second had 9 bicycles and 90 other vehicles, and so on. Streets were classified as "residential’, ‘fairly busy’, or ‘busy’ before the data were gathered.

  1. Set up a model for the data in the table so that, for \(j=1, \ldots, 10,\) the observed number of bicycles at location \(j\) is binomial with unknown probability \(\theta_{j}\) and sample size equal to the total number of vehicles (bicycles included) in that block. The parameter \(\theta_{j}\) can be interpreted as the underlying or “true” proportion of traffic at location \(j\) that is bicycles. Assign a beta population distribution for the parameters \(\theta_{j}\) and a noninformative hyperprior distribution as in the rat tumor example of Section 5.3 of BDA. Write down the joint posterior distribution.

\(\mathbf{Solution.}\qquad\) First, we have \(\theta_{j}\) which represents the proportion of bikes at location \(j\) where, \[ \theta_{j}\sim\operatorname{Beta}(\alpha,\beta)\quad\forall\quad j=1,\ldots,10 \]

Also, \(y_{j}\) corresponds to the number of bicycles observed at city block \(j\), where \[ y_{j}\sim Bin\left(n_{j},\theta_{j}\right) \]

and \(n_{j}\) is the total number of vehicles at location \(j\). \[ \theta_{j}\sim\operatorname{Beta}(\alpha,\beta)\quad\forall\quad j=1,\ldots,10 \]

Uniform on \(\left(\frac{\alpha}{\alpha+\beta},\left(\alpha+\beta\right)^{-1/2}\right)\), \[ p\left(\alpha,\beta\right)\propto(\alpha+\beta)^{-5/2} \]

The joint posterior distribution is, \[\begin{align*} p(\boldsymbol{\theta},\alpha,\beta\mid y) & \propto p\left(y\mid\boldsymbol{\theta},\alpha,\beta\right)p\left(\boldsymbol{\theta},\alpha,\beta\right)\\ & =p\left(y\mid\boldsymbol{\theta},\alpha,\beta\right)p\left(\boldsymbol{\theta}\mid\alpha,\beta\right)p\left(\alpha,\beta\right)\\ & \propto\left(\alpha+\beta\right)^{-5/2}\prod_{j=1}^{10}p\left(\theta_{j}\mid\alpha,\beta\right)\prod_{j=1}^{10}p\left(y_{j}\mid\theta_{j}\right)\\ & \propto\left(\alpha+\beta\right)^{-5/2}\prod_{j=1}^{10}\frac{1}{\mathrm{B}(\alpha,\beta)}\theta_{j}^{\alpha-1}\left(1-\theta_{j}\right)^{\beta-1}\prod_{j=1}^{10}\theta_{j}^{y_{j}}(1-\theta_{j})^{n_{j}-y_{j}}\\ & =\left(\alpha+\beta\right)^{-5/2}\prod_{j=1}^{10}\frac{1}{\mathrm{B}(\alpha,\beta)}\theta_{j}^{\alpha+y_{j}-1}\left(1-\theta_{j}\right)^{\beta+n_{j}-y_{j}-1} \end{align*}\]

where \(\mathrm{B}(\alpha,\beta)=\frac{\Gamma(x)f(\beta)}{\Gamma(\alpha,\beta)}\).

   

  1. Compute the marginal posterior density of the hyperparameters and draw simulations from the joint posterior distribution of the parameters and hyperparameters.

\(\mathbf{Solution.}\qquad\) Using the fact that, \[\begin{align*} p\left(\alpha,\beta\mid y\right)=\int p\left(\theta,\alpha,\beta\mid y\right)d\theta\Longleftrightarrow p\left(\alpha,\beta\mid y\right)=\frac{p\left(\theta,\alpha,\beta\mid y\right)}{p\left(\theta\mid\alpha,\beta,y\right)} \end{align*}\]

From question 1 we know \(p\left(\theta,\alpha,\beta\mid y\right),\) so we need \(p\left(\theta\mid\alpha,\beta,y\right)\), \[\begin{align*} p\left(\theta\mid\alpha,\beta,y\right) & =\prod_{j=1}^{10}\frac{1}{B\left(\alpha+y_{j},\beta+n_{j}-y_{j}\right)}\theta_{j}^{\alpha+y_{j}-1}\left(1-\theta_{j}\right)^{\beta+n_{j}-y_{j-1}}\\ & \sim\mathrm{Beta}\left(\alpha+y_{j},\beta+n_{j}-y_{j}\right) \end{align*}\]

Then, \[ p\left(\alpha,\beta\mid y\right)=\frac{p\left(\theta,\alpha,\beta\mid y\right)}{p\left(\theta\mid\alpha,\beta,y\right)}\propto\left(\alpha+\beta\right)^{-5/2}\prod_{j=1}^{10}\frac{1}{\mathrm{B}\left(\alpha,\beta\right)}\mathrm{B}\left(\alpha+y_{j},\beta+n_{j}-y_{j}\right) \]

Adopting some code from HierarchicalModels.html, we first draw simulations from the joint posterior distribution of the hyperparameters,

# QUESTION 2 ---------
bikes = c(16, 9, 10, 13, 19, 20, 18, 17, 35, 55)
other = c(58, 90, 48, 57, 103, 57, 86, 112, 273, 64)

y = bikes
n = bikes + other

# Grid for alpha, beta
a = seq(0.05, 10, length.out = 200)
b = seq(0.05, 33, length.out = 200)

cA = rep(a, each = length(b))
cB = rep(b, length(a))

# Function to compute log p(a,b | y)
lpfun = function(a, b, y, n) {
  log(a + b) * (-5/2) + sum(-lbeta(a,b) + lbeta(a + y, b + n - y))
}

# For each combination of alpha, beta, compute log p(a,b | y)
lp = mapply(lpfun, cA, cB, MoreArgs = list(y, n))

# marginals 
df_marg = data.frame(x = cA, y = cB, p = exp(lp - max(lp)))

# sample from posterior distribution of theta_j
nsamp = 1e4
samp_indices = sample(length(df_marg$p), size = nsamp,
                       replace = TRUE, prob = df_marg$p / sum(df_marg$p))

samp_A = cA[samp_indices[1:nsamp]]
samp_B = cB[samp_indices[1:nsamp]]

samplestheta = matrix(0, nsamp, length(y))

# p(theta | a,b,y) ~ Beta(a+y, b+n-y)
for (j in 1:length(y)) {
  samplestheta[, j] = sapply(1:nsamp, 
                             function(k) rbeta(1, 
                                               samp_A[k] + y[j],
                                               samp_B[k] + n[j] - y[j]))
}

# Create plot of marginal posterior
title1 <- 'The marginal posterior of alpha and beta in hierarchical model'
# create a plot of the marginal posterior density
postdensityalphabeta = ggplot(data = df_marg, aes(x = x, y = y)) +
  geom_raster(aes(fill = p, alpha = p), interpolate = T) +
  geom_contour(aes(z = p), colour = 'black', size = 0.2) +
  coord_cartesian(xlim = c(0,10), ylim = c(0, 33)) +
  labs(x = 'alpha', y = 'beta', title = title1) +
  scale_fill_gradient(low = 'yellow', high = 'red', guide = F) +
  scale_alpha(range = c(0, 1), guide = F)
postdensityalphabeta

Next we draw simulations of the \(\theta_{j}\). We highlight \(\theta_7\) in red for reference.

breakpoints <- seq(0, 1, length.out = 200)
xx <- hist(samplestheta[, 1], breaks = breakpoints, plot = FALSE)$mids
densitytheta <- matrix(0, length(breakpoints) - 1, length(y))
for(j in 1:length(y)){
  densitytheta[, j] <- hist(samplestheta[, j], breaks = breakpoints, plot = FALSE)$density
}

indtonum <- function(x) strtoi(substring(x,2))

df_hier_samp <-densitytheta %>%
  as.data.frame() %>% cbind(xx) %>% gather(ind, p, -xx)
plot_hier7_samp <- ggplot(data = subset(df_hier_samp, indtonum(ind)%%1==0)) +
  geom_line(aes(x = xx, y = p, color = (ind=='V7'), group = ind)) +
  labs(x = expression(theta), y = '', title = 'Hierarchical model', color = '') +
  scale_color_manual(values = c('blue', 'red'), guide = F) +
  scale_y_continuous(breaks = NULL) +
  theme(legend.background = element_blank(), legend.position = c(0.8,0.9))

# average density over samples (of a and b) for each (n,y)-pair
# at each point x
bdens2 <- function(n, y, a, b, x)
  rowMeans(mapply(dbeta, a + y, n - y + b, MoreArgs = list(x = x)))
x = seq(0.0001, 0.9999, length.out = 1000)

df_hier <- mapply(bdens2, n, y, MoreArgs = list(samp_A, samp_B, x)) %>%
  as.data.frame() %>% cbind(x) %>% gather(ind, p, -x)

plot_hier7 <- ggplot(data = subset(df_hier, indtonum(ind)%%1==0)) +
  geom_line(aes(x = x, y = p, color = (ind=='V7'), group = ind)) +
  labs(x = expression(theta), y = '', title = 'Hierarchical model', color = '') +
  scale_color_manual(values = c('blue', 'red'), guide = F) +
  scale_y_continuous(breaks = NULL) +
  theme(legend.background = element_blank(), legend.position = c(0.8,0.9))

grid.arrange(plot_hier7_samp, plot_hier7)

   

  1. Compare the posterior distributions of the parameters \(\theta_{j}\) to the raw proportions, (number of bicycles / total number of vehicles) in location \(j .\) How do the inferences from the posterior distribution differ from the raw proportions?

\(\mathbf{Solution.}\qquad\) The red points in the plot below represent the raw proportions of bicycles to other vehicles. We notice that each raw proportion is close to the center of each posterior interval. Also, the raw proportions are close to the blue 45 degree line.

   

  1. Give a \(95 \%\) posterior interval for the average underlying proportion of traffic that is bicycles.

\(\mathbf{Solution.}\qquad\) Our 95% posterior interval for the average underlying proportion of traffic that is bicycles comes from computing \(\frac{\alpha}{\alpha+\beta}\) over all samples and then computing quantiles.

Thus our 95% posterior interval is [0.146, 0.297]

   

  1. A new city block is sampled at random and is a residential street with a bike route. In an hour of observation, 100 vehicles of all kinds go by. Give a \(95 \%\) posterior interval for the number of those vehicles that are bicycles. Discuss how much you trust this interval in application.

\(\mathbf{Solution.}\qquad\) Below we find a 95% posterior interval for the number of those vehicles that are bicycles.

This posterior interval makes sense. It covers 9 out of the 10 blocks in our data.

   

  1. Was the beta distribution for the \(\theta_{j}\) ’s reasonable?

\(\mathbf{Solution.}\qquad\) From question 3, we saw evidence that the raw proportions were centered around posterior intervals for each \(\theta_j\). If we compare our sample draws for \(\theta\) from question 5 to the raw proportions, we can see that the beta distribution models the data well. Thus it is reasonable to use beta.