\(\mathbf{Solution.}\qquad\) Given that that \(\pi_{\boldsymbol{\alpha}}\left(\boldsymbol{\theta}\right)\sim Dirichlet\left(\boldsymbol{\alpha}\right)\) and the data \(\mathbf{X}\sim Multinomial\left(n,\boldsymbol{\theta}\right)\) , then the posterior distribution of \(\theta\) is given by, \[\begin{align*} p\left(\boldsymbol{\theta}\mid x_{1:K}\right) & \propto\pi_{\boldsymbol{\alpha}}\left(\boldsymbol{\theta}\right)p\left(x_{1:K}\mid\boldsymbol{\theta}\right)\\ & \propto\prod_{j=1}^{K}\theta_{j}^{\alpha_{j}-1}\prod_{j=1}^{K}\theta_{j}^{x_{j}}\\ & =\prod_{j=1}^{K}\theta_{j}^{\alpha_{j}-1+x_{j}} \end{align*}\]
Thus \(p\left(\boldsymbol{\theta}\mid x_{1:K}\right)\sim Dirichlet\left(\alpha_{j}+x_{j}\right)\text{ for }j=1,\dots,K\). The posterior mean of each \(\theta_{j}\) is, \[ \mathbb{E}\left[\theta_{j}\mid x_{1:K}\right]=\frac{\alpha_{j}+x_{j}}{\sum_{j=1}^{K}\left(\alpha_{j}+x_{j}\right)} \]
\(\mathbf{Solution.}\qquad\)
\[\begin{align*} p\left(\tilde{x}_{1:K}\mid x_{1:K}\right) & =\int p\left(\tilde{x}_{1:K}\mid\theta\right)p\left(\theta\mid x_{1:K}\right)d\theta\\ & =\int\frac{n!}{\prod_{j=1}^{K}\tilde{x}_{j}!}\prod_{j=1}^{K}\theta_{j}^{\tilde{x}_{j}}\frac{\Gamma\left(\sum_{j=1}^{K}\alpha_{j}+x_{j}\right)}{\prod_{j=1}^{K}\Gamma\left(\alpha_{j}+x_{j}\right)}\prod_{j=1}^{K}\theta_{j}^{\alpha_{j}+x_{j}-1}d\theta\\ & =\frac{\Gamma(n+1)}{\prod_{j=1}^{K}\Gamma\left(\tilde{x}_{j}+1\right)}\frac{\Gamma\left(\sum_{j=1}^{K}\alpha_{j}+x_{j}\right)}{\prod_{j=1}^{K}\Gamma\left(\alpha_{j}+x_{j}\right)}\int\prod_{j=1}^{K}\theta_{j}^{\tilde{x}_{j}+\alpha_{j}-1+x_{j}}d\theta \end{align*}\]
Focusing on the last integral, we can write it is, \[\begin{align*} \int\prod_{j=1}^{K}\theta_{j}^{\tilde{\alpha}_{j}-1}d\theta & = \int\theta_{1}^{\tilde{\alpha}_{1}-1}\theta_{2}^{\tilde{\alpha}_{2}-1}\cdot\cdot\cdot\theta_{K}^{\tilde{\alpha}_{K}-1}d\theta \end{align*}\]
where we let \(\tilde{\alpha}_{j}=\tilde{x}_{j}+\alpha_{j}+x_{j}\). Next we substitute \(\theta_{K}=1-\sum_{j=1}^{K-1}\theta_{j}\) and integrate each variable separately, \[\begin{align*} & =\int_{0}^{1}\theta_{1}^{\widetilde{\alpha}_{1}-1}d\theta_{1}\int_{0}^{1-\theta_{1}}\theta_{2}^{\tilde{\alpha}_{2}-1}d\theta_{2}\cdots\int_{0}^{1-\sum_{j=1}^{K-2}\theta_{j}}\theta_{K-1}^{\tilde{\alpha}_{K-1}-1}\left(1-\sum_{j=1}^{K-1}\theta_{j}\right)^{\tilde{\alpha}_{K-1}}d\theta_{k-1}\\ & =\mathrm{B}\left(\tilde{\alpha}_{K},\tilde{\alpha}_{K-1}\right)\prod_{j=1}^{K-2}\mathrm{B}\left(\tilde{\alpha}_{j},\tilde{\alpha}_{K}+\sum_{i=j+1}^{K-1}\tilde{\alpha}_{i}\right)\\ & =\frac{\prod_{j=1}^{K}\Gamma\left(\tilde{\alpha}_{j}\right)}{\Gamma\left(\sum_{j=1}^{K}\tilde{\alpha}_{j}\right)} \end{align*}\]
Substituting this integral back, we have, \[ p\left(\tilde{x}_{1:K}\mid x_{1:K}\right)=\frac{\Gamma(n+1)}{\prod_{j=1}^{K}\Gamma\left(\tilde{x}_{j}+1\right)}\frac{\Gamma\left(\sum_{j=1}^{K}\alpha_{j}+x_{j}\right)}{\prod_{j=1}^{K}\Gamma\left(\alpha_{j}+x_{j}\right)}\frac{\prod_{j=1}^{K}\Gamma\left(\tilde{x}_{j}+\alpha_{j}+x_{j}\right)}{\Gamma\left(2n+\sum_{j=1}^{K}\alpha\right)} \]
\(\mathbf{Solution.}\qquad\) First, we solve for the marginal posterior distribution. From 1 we know the posterior distribution \(p\left(\boldsymbol{\theta}\mid x_{1:k}\right)\propto\prod_{j=1}^{k}\theta_{j}^{\alpha_{j}-1+x_{j}},\) then the marginal distribution of parameters \(\theta_{1},\theta_{2},1-\theta_{1}-\theta_{2}\), can be written as, \[ p\left(\theta_{1},\theta_{2}\mid x_{1:k}\right)\propto\theta_{1}^{\alpha_{1}-1+x_{1}}\theta_{2}^{\alpha_{2}-1+x_{2}}\left(1-\theta_{1}-\theta_{2}\right)^{-1+\sum_{j=3}^{K}\alpha_{j}+x_{j}} \]
Define \(Y:=\frac{\theta_{1}}{\theta_{1}+\theta_{2}}\text{ and }Z:=\theta_{1}+\theta_{2}\). Then we can find, \[ p\left(Y,Z\mid x_{1:K}\right)=p\left(\theta_{1}\left(Y,Z\right),\theta_{2}\left(Y,Z\right)\mid x_{1:K}\right)\left|\frac{\partial\left(\theta_{1},\theta_{2}\right)}{\partial(Y,Z)}\right| \]
We rewrite \(\theta_{1},\theta_{2}\) in terms of \(Y,Z\). \[ \begin{cases} Y=\frac{\theta_{1}}{\theta_{1}+\theta_{2}} & \Longrightarrow\theta_{2}=Z-ZY\\ Z:=\theta_{1}+\theta_{2} & \Longrightarrow\theta_{1}=ZY \end{cases} \]
Then the determinant of the Jacobian matrix is, \[ \left|\frac{\partial\left(\theta,\theta_{2}\right)}{\partial(Y,Z)}\right|=\mathrm{det}\left[\begin{array}{cc} Z & Y\\ -Z & 1-Y \end{array}\right]=Z \]
We can write down the joint posterior distribution of \(Y,Z\), \[\begin{align*} & p\left(Y,Z\mid X_{1:k}\right)\\ & \propto Z(ZY)^{\alpha_{1}-1+x_{1}}\left(Z\left(1-Y\right)\right)^{\alpha_{2}-1+x_{x}}\left(1-Z\right)^{-1+\sum_{j=3}^{K}\alpha_{j}+x_{j}}\\ & =Z^{\alpha_{1}+x_{1}+\alpha_{2}+x_{2}-1}\left(1-Z\right)^{-1+\sum_{j=3}^{K}\alpha_{j}+x_{j}}Y^{\alpha_{1}-1+x_{1}}\left(1-Y\right)^{\alpha_{2}-1+x_{2}}\\ & \propto\mathrm{Beta}\left(Y;\alpha_{1}+x_{1},\alpha_{2}+x_{2}\right)\mathrm{Beta}\left(Z;\alpha_{1}+\alpha_{2}+x_{1}+x_{2},\sum_{j=3}^{K}\alpha_{j}+x_{j}\right) \end{align*}\]
We see that the joint posterior distribution of \(Y,Z\) factorizes into the product of marginal posterior distributions of \(Y,Z\), which are both Beta. Thus the marginal posterior distribution of \(Y\) is, \[ Y\mid x_{1:K}\sim\mathrm{Beta}\left(\alpha_{1}+x_{1},\alpha_{2}+x_{2}\right) \]
Next is to show this is identical to the following scenario. Suppose that \(Y\sim\mathrm{Beta}\left(\alpha_{1},\alpha_{2}\right)\) and \(x_{1}\sim Bin\left(x_{1}+x_{2},Y\right)\). Then the posterior distribution of \(Y\) is, \[\begin{align*} p\left(Y\mid x_{1}\right) & \propto p\left(Y\right)p\left(x_{1}\mid Y\right)\\ & \propto Y^{\alpha_{1}-1}\left(1-Y\right)^{\alpha_{2}-1}\left(Y\right)^{x_{1}}\left(1-Y\right)^{x_{2}}\\ & =Y^{\alpha_{1}+x_{1}-1}\left(1-Y\right)^{\alpha_{2}+x_{2}-1} \end{align*}\]
which corresponds to \(\mathrm{Beta}\left(\alpha_{1}+x_{1},\alpha_{2}+x_{2}\right)\).
Figure 1: Number of respondents in each preference category from ABC News pre- and post dobet of post-debate surveys in 1988.
\(\mathbf{Solution.}\qquad\) Suppose we model pre-debate and post-debate number of votes per candidate with multinomial distributions, each \(\boldsymbol{\alpha}\sim Dirichlet\left(1,1,1\right)\) and \(\boldsymbol{\phi}\sim Dirichlet\left(1,1,1\right)\) respectively. Then the posterior distributions of each multinomial model are, \[ \boldsymbol{\alpha}\mid x\sim Dirichlet\left(295,308,39\right),\qquad\boldsymbol{\phi}\mid y\sim Dirichlet\left(289,333,20\right) \]
where \(x,y\) are data from pre-debate and post-debate. From question 3, we can focus on the proportion between Bush and Dukakis and define, \[ \theta_{1}=\frac{\alpha_{1}}{\alpha_{1}+\alpha_{2}},\qquad\theta_{2}=\frac{\phi_{1}}{\phi_{1}+\phi_{2}} \]
and we know the posterior distributions of each proportion are, \[ \theta_{1}\mid x\sim\mathrm{Beta}\left(295,308\right),\qquad\theta_{2}\mid y\sim\mathrm{Beta}\left(289,333\right) \]
Finally we can sample draws of \(\theta_{1}\) and \(\theta_{2}\). We draw one million times from each beta distribution and compute their difference.
# QUESTION 4 ----------
draws = 1e6
beta_1 = c(295, 308)
beta_2 = c(289, 333)
post_1 = rbeta(draws, beta_1[1], beta_1[2])
post_2 = rbeta(draws, beta_2[1], beta_2[2])
prob_shift = mean(post_2 - post_1 > 0)
width = 0.001
df = data.frame("post_prob" = post_2 - post_1)
ggplot(df, aes(x = post_prob)) +
geom_histogram(aes(y = stat(count) / sum(count)), binwidth = width,
fill = "steelblue") +
ylab("Density") +
xlab(TeX("$\\theta_{2}-\\theta_{1}$")) +
ggtitle(TeX("Histogram of the posterior density for $\\theta_{2}-\\theta_{1}$"))
Thus, we estimate that the probability of a shift toward Bush is around 0.195.
A survey was done of bicycle and other vehicular traffic in the neighborhood of the campus of the University of California, Berkeley, in the spring of \(1993 .\) Sixty city blocks were selected at random; each block was observed for one hour, and the numbers of bicycles and other vehicles traveling along that block were recorded. The sampling was stratified into six types of city blocks: busy, fairly busy, and residential streets, with and without bike routes, with ten blocks measured in each stratum. Figure 2 displays the number of bicycles and other vehicles recorded in the study. For this problem, restrict your attention to the first two rows of the table: residential streets labeled as ‘bike routes’, which we will use to illustrate this computational exercise.
Figure 2: Counts of bicycles and other vehicles in one hour in each of 10 city blocks in each of six categories. (The data for two of the residential blocks were lost. For example, the first block had 16 bicycles and 58 other vehicles, the second had 9 bicycles and 90 other vehicles, and so on. Streets were classified as "residential’, ‘fairly busy’, or ‘busy’ before the data were gathered.
\(\mathbf{Solution.}\qquad\) First, we have \(\theta_{j}\) which represents the proportion of bikes at location \(j\) where, \[ \theta_{j}\sim\operatorname{Beta}(\alpha,\beta)\quad\forall\quad j=1,\ldots,10 \]
Also, \(y_{j}\) corresponds to the number of bicycles observed at city block \(j\), where \[ y_{j}\sim Bin\left(n_{j},\theta_{j}\right) \]
and \(n_{j}\) is the total number of vehicles at location \(j\). \[ \theta_{j}\sim\operatorname{Beta}(\alpha,\beta)\quad\forall\quad j=1,\ldots,10 \]
Uniform on \(\left(\frac{\alpha}{\alpha+\beta},\left(\alpha+\beta\right)^{-1/2}\right)\), \[ p\left(\alpha,\beta\right)\propto(\alpha+\beta)^{-5/2} \]
The joint posterior distribution is, \[\begin{align*} p(\boldsymbol{\theta},\alpha,\beta\mid y) & \propto p\left(y\mid\boldsymbol{\theta},\alpha,\beta\right)p\left(\boldsymbol{\theta},\alpha,\beta\right)\\ & =p\left(y\mid\boldsymbol{\theta},\alpha,\beta\right)p\left(\boldsymbol{\theta}\mid\alpha,\beta\right)p\left(\alpha,\beta\right)\\ & \propto\left(\alpha+\beta\right)^{-5/2}\prod_{j=1}^{10}p\left(\theta_{j}\mid\alpha,\beta\right)\prod_{j=1}^{10}p\left(y_{j}\mid\theta_{j}\right)\\ & \propto\left(\alpha+\beta\right)^{-5/2}\prod_{j=1}^{10}\frac{1}{\mathrm{B}(\alpha,\beta)}\theta_{j}^{\alpha-1}\left(1-\theta_{j}\right)^{\beta-1}\prod_{j=1}^{10}\theta_{j}^{y_{j}}(1-\theta_{j})^{n_{j}-y_{j}}\\ & =\left(\alpha+\beta\right)^{-5/2}\prod_{j=1}^{10}\frac{1}{\mathrm{B}(\alpha,\beta)}\theta_{j}^{\alpha+y_{j}-1}\left(1-\theta_{j}\right)^{\beta+n_{j}-y_{j}-1} \end{align*}\]
where \(\mathrm{B}(\alpha,\beta)=\frac{\Gamma(x)f(\beta)}{\Gamma(\alpha,\beta)}\).
\(\mathbf{Solution.}\qquad\) Using the fact that, \[\begin{align*} p\left(\alpha,\beta\mid y\right)=\int p\left(\theta,\alpha,\beta\mid y\right)d\theta\Longleftrightarrow p\left(\alpha,\beta\mid y\right)=\frac{p\left(\theta,\alpha,\beta\mid y\right)}{p\left(\theta\mid\alpha,\beta,y\right)} \end{align*}\]
From question 1 we know \(p\left(\theta,\alpha,\beta\mid y\right),\) so we need \(p\left(\theta\mid\alpha,\beta,y\right)\), \[\begin{align*} p\left(\theta\mid\alpha,\beta,y\right) & =\prod_{j=1}^{10}\frac{1}{B\left(\alpha+y_{j},\beta+n_{j}-y_{j}\right)}\theta_{j}^{\alpha+y_{j}-1}\left(1-\theta_{j}\right)^{\beta+n_{j}-y_{j-1}}\\ & \sim\mathrm{Beta}\left(\alpha+y_{j},\beta+n_{j}-y_{j}\right) \end{align*}\]
Then, \[ p\left(\alpha,\beta\mid y\right)=\frac{p\left(\theta,\alpha,\beta\mid y\right)}{p\left(\theta\mid\alpha,\beta,y\right)}\propto\left(\alpha+\beta\right)^{-5/2}\prod_{j=1}^{10}\frac{1}{\mathrm{B}\left(\alpha,\beta\right)}\mathrm{B}\left(\alpha+y_{j},\beta+n_{j}-y_{j}\right) \]
Adopting some code from HierarchicalModels.html
, we first draw simulations from the joint posterior distribution of the hyperparameters,
# QUESTION 2 ---------
bikes = c(16, 9, 10, 13, 19, 20, 18, 17, 35, 55)
other = c(58, 90, 48, 57, 103, 57, 86, 112, 273, 64)
y = bikes
n = bikes + other
# Grid for alpha, beta
a = seq(0.05, 10, length.out = 200)
b = seq(0.05, 33, length.out = 200)
cA = rep(a, each = length(b))
cB = rep(b, length(a))
# Function to compute log p(a,b | y)
lpfun = function(a, b, y, n) {
log(a + b) * (-5/2) + sum(-lbeta(a,b) + lbeta(a + y, b + n - y))
}
# For each combination of alpha, beta, compute log p(a,b | y)
lp = mapply(lpfun, cA, cB, MoreArgs = list(y, n))
# marginals
df_marg = data.frame(x = cA, y = cB, p = exp(lp - max(lp)))
# sample from posterior distribution of theta_j
nsamp = 1e4
samp_indices = sample(length(df_marg$p), size = nsamp,
replace = TRUE, prob = df_marg$p / sum(df_marg$p))
samp_A = cA[samp_indices[1:nsamp]]
samp_B = cB[samp_indices[1:nsamp]]
samplestheta = matrix(0, nsamp, length(y))
# p(theta | a,b,y) ~ Beta(a+y, b+n-y)
for (j in 1:length(y)) {
samplestheta[, j] = sapply(1:nsamp,
function(k) rbeta(1,
samp_A[k] + y[j],
samp_B[k] + n[j] - y[j]))
}
# Create plot of marginal posterior
title1 <- 'The marginal posterior of alpha and beta in hierarchical model'
# create a plot of the marginal posterior density
postdensityalphabeta = ggplot(data = df_marg, aes(x = x, y = y)) +
geom_raster(aes(fill = p, alpha = p), interpolate = T) +
geom_contour(aes(z = p), colour = 'black', size = 0.2) +
coord_cartesian(xlim = c(0,10), ylim = c(0, 33)) +
labs(x = 'alpha', y = 'beta', title = title1) +
scale_fill_gradient(low = 'yellow', high = 'red', guide = F) +
scale_alpha(range = c(0, 1), guide = F)
postdensityalphabeta
Next we draw simulations of the \(\theta_{j}\). We highlight \(\theta_7\) in red for reference.
breakpoints <- seq(0, 1, length.out = 200)
xx <- hist(samplestheta[, 1], breaks = breakpoints, plot = FALSE)$mids
densitytheta <- matrix(0, length(breakpoints) - 1, length(y))
for(j in 1:length(y)){
densitytheta[, j] <- hist(samplestheta[, j], breaks = breakpoints, plot = FALSE)$density
}
indtonum <- function(x) strtoi(substring(x,2))
df_hier_samp <-densitytheta %>%
as.data.frame() %>% cbind(xx) %>% gather(ind, p, -xx)
plot_hier7_samp <- ggplot(data = subset(df_hier_samp, indtonum(ind)%%1==0)) +
geom_line(aes(x = xx, y = p, color = (ind=='V7'), group = ind)) +
labs(x = expression(theta), y = '', title = 'Hierarchical model', color = '') +
scale_color_manual(values = c('blue', 'red'), guide = F) +
scale_y_continuous(breaks = NULL) +
theme(legend.background = element_blank(), legend.position = c(0.8,0.9))
# average density over samples (of a and b) for each (n,y)-pair
# at each point x
bdens2 <- function(n, y, a, b, x)
rowMeans(mapply(dbeta, a + y, n - y + b, MoreArgs = list(x = x)))
x = seq(0.0001, 0.9999, length.out = 1000)
df_hier <- mapply(bdens2, n, y, MoreArgs = list(samp_A, samp_B, x)) %>%
as.data.frame() %>% cbind(x) %>% gather(ind, p, -x)
plot_hier7 <- ggplot(data = subset(df_hier, indtonum(ind)%%1==0)) +
geom_line(aes(x = x, y = p, color = (ind=='V7'), group = ind)) +
labs(x = expression(theta), y = '', title = 'Hierarchical model', color = '') +
scale_color_manual(values = c('blue', 'red'), guide = F) +
scale_y_continuous(breaks = NULL) +
theme(legend.background = element_blank(), legend.position = c(0.8,0.9))
grid.arrange(plot_hier7_samp, plot_hier7)
\(\mathbf{Solution.}\qquad\) The red points in the plot below represent the raw proportions of bicycles to other vehicles. We notice that each raw proportion is close to the center of each posterior interval. Also, the raw proportions are close to the blue 45 degree line.
# QUESTION 3 ---------
# Compare theta_j to raw proportions (bicycles / total vehicles)
postintervals_sample = apply(samplestheta, 2,
function(x) c(median(x),
c(quantile(x, c(0.025, 0.975)))))
rawest <- jitter(y / n)
plot(rawest, postintervals_sample[1, ], pch = 19, col = 'red', cex = 0.5,
ylim = c(min(postintervals_sample) * 0.8, max(postintervals_sample) * 1.2),
ylab = 'posterior intervals', xlab = 'raw estimator')
# Plot theta samples and ranges for each estimate of theta_j
for(k in 1:length(y)){
lines(cbind(rep(rawest[k], 2), postintervals_sample[2:3, k]))
}
# 45 degree line
lines(seq(0, 0.5, by = 0.01), seq(0, 0.5, by = 0.01), col = 'blue')
\(\mathbf{Solution.}\qquad\) Our 95% posterior interval for the average underlying proportion of traffic that is bicycles comes from computing \(\frac{\alpha}{\alpha+\beta}\) over all samples and then computing quantiles.
Thus our 95% posterior interval is [0.146, 0.297]
\(\mathbf{Solution.}\qquad\) Below we find a 95% posterior interval for the number of those vehicles that are bicycles.
# QUESTION 5 ---------
theta = mapply(rbeta, 1, samp_A, samp_B)
y_j = mapply(rbinom, 1, 100, theta)
CI = quantile(y_j, c(.025,.975))
This posterior interval makes sense. It covers 9 out of the 10 blocks in our data.
\(\mathbf{Solution.}\qquad\) From question 3, we saw evidence that the raw proportions were centered around posterior intervals for each \(\theta_j\). If we compare our sample draws for \(\theta\) from question 5 to the raw proportions, we can see that the beta distribution models the data well. Thus it is reasonable to use beta.
# QUESTION 6 ---------
df1 = data.frame(raw_prop = y / n)
df2 = data.frame(theta = theta)
ggplot(df1, aes(x = raw_prop)) +
geom_histogram(bins = 10, binwidth = 0.05, fill = "steelblue") +
geom_density(data = df2, aes(theta), inherit.aes = FALSE, colour = "darkred") +
xlim(0, 0.6) +
xlab("raw proportion")