Practice of Bayes Formula

Suppose that if \(\theta=1, \text { then } \mathrm{y}\) has a normal distribution with mean 1 and standard deviation \(\sigma,\) and if \(\theta=2,\) then \(y\) has a normal distribution with mean 2 and standard deviation \(\sigma .\) Also, suppose \(\operatorname{Pr}(\theta=1)=0.5\) and \(\operatorname{Pr}(\theta=2)=0.5\)

  1. For \(\sigma=2,\) derive the formula for the marginal probability density for \(\mathrm{y}, \mathrm{p}(\mathrm{y})\) and sketch/visualize it in R.
  2. What is \(\operatorname{Pr}(\theta=1 | y=1),\) again supposing \(\sigma=2 ?\)
  3. Describe how the posterior density of \(\theta\) changes in shape as \(\sigma\) is increased and as it is decreased.

1. \(\mathbf{Solution.}\qquad\) We can compute the marginal probability density for \(y\) as follows, \[\begin{align*} p(y) & =p(y\mid\theta=1)\mathbb{P}(\theta=1)+p(y\mid\theta=2)\mathbb{P}(\theta=2)\\ & =\frac{1}{2}\left[\frac{1}{2\sqrt{2\pi}}\exp\left\{ -\frac{1}{8}(y-1)^{2}\right\} +\frac{1}{2\sqrt{2\pi}}\exp\left\{ -\frac{1}{8}(y-2)^{2}\right\} \right]\\ & =\frac{1}{4\sqrt{2\pi}}\left(\exp\left\{ -\frac{1}{8}(y-1)^{2}\right\} +\exp\left\{ -\frac{1}{8}(y-2)^{2}\right\} \right) \end{align*}\]

Below we show the R code to plot this density.

   

2. \(\mathbf{Solution.}\qquad\) Using Bayes formula we can compute the posterior density as follows, \[\begin{align*} \mathbb{P}(\theta=1\mid y=1) & =\frac{\mathbb{P}(y=1\mid\theta=1)\mathbb{P}(\theta=1)}{\mathbb{P}(y=1\mid\theta=1)\mathbb{P}(\theta=1)+\mathbb{P}(y=1|\theta=2)\mathbb{P}(\theta=2)}\\ & =\frac{\frac{1}{4\sqrt{2\pi}}}{\frac{1}{4\sqrt{2\pi}}\left(1+e^{-1/8}\right)}\\ & =\left(1+e^{-1/8}\right)^{-1} \end{align*}\]

   

3. \(\mathbf{Solution.}\qquad\) Using our result from part 2, we wish to plot the prior density as a function of \(\sigma\), \[ \mathbb{P}(\theta=1\mid y=1)=\left(1+e^{-1/\left(2\sigma^{2}\right)}\right)^{-1} \]

The following R code shows this plot for \(\sigma \in [0,10]\).

We see that as \(\sigma\) decreases and approaches \(0\), the posterior density goes to \(1\). As \(\sigma\) increases, the posterior density gets close to \(\frac{1}{2}\), which is equivalent to flipping a fair coin.

Normal distribution with unknown mean

A random sample of \(n\) students is drawn from a large population, and their weights are measured. The average weight of the \(n\) sampled students is \(\bar{y}=150\) pounds. Assume the weights in the population are normally distributed with unknown mean \(\theta\) and known standard deviation 20 pounds. Suppose your prior distribution for \(\theta\) is normal with mean 180 and standard deviation \(40 .\)

  1. Give your posterior distribution for \(\theta .\) (Your answer will be a function of \(n .\) )
  2. A new student is sampled at random from the same population and has a weight of \(\tilde{y}\) pounds. Give a posterior predictive distribution for \(\tilde{y} .\) (Your answer will still be a function of \(n .\) )
  3. For \(n=10,\) give a \(95 \%\) posterior interval for \(\theta\) and a \(95 \%\) posterior predictive interval for \(\tilde{y} .\) Do the same for \(n=100 .\)

1. \(\mathbf{Solution.}\qquad\) First we can construct the product of the likelihood function with the prior density, \[\begin{align*} \mathbb{P}\left(\theta\mid y_{1},\ldots,y_{n}\right) & \propto\mathbb{P}\left(y_{1},\ldots,y_{n}\mid\theta\right)p(\theta)\\ & \propto\prod_{i=1}^{n}\exp\left\{ \frac{-1}{2(20)^{2}}\left(y_{i}-\theta\right)^{2}\right\} \exp\left\{ \frac{-1}{2\left(40\right)^{2}}(\theta-180)^{2}\right\} \end{align*}\]

Using formula and notation from lecture, we have \[ y_{i}\sim N\left(\theta,\sigma^{2}\right),\quad\theta\sim N\left(u_{0},\tau_{0}^{2}\right), \quad\mathbb{P}\left(\theta\mid y_{1},\ldots,y_{n}\right)\sim N\left(\mu_{n},\tau_{n}^{2}\right) \]

Then, \[ \begin{aligned}\mu_{n} & =\frac{\frac{180}{40^{2}}+\frac{150n}{20^{2}}}{\frac{1}{40^{2}}+\frac{n}{20^{2}}} & \qquad\tau_{n}^{2} & =\frac{1}{\frac{1}{40^{2}}+\frac{n}{20^{2}}}\\ & =\frac{180+600n}{1+4n} & & =\frac{40^{2}}{1+4n} \end{aligned} \]

For simplicity we will write \(y_{1:n}:=y_{1},\ldots,y_{n}\). Therefore, \[\begin{equation} \mathbb{P}\left(\theta\mid y_{1:n}\right)\sim N\left(\frac{180+600n}{1+4n},\frac{40^{2}}{1+4n}\right)\label{eq:1} \end{equation}\]

   

2. \(\mathbf{Solution.}\qquad\) We can compute \(p\left(\tilde{y}\mid y_{1:n}\right)\) as follows, \[ p\left(\tilde{y}\mid y_{1:n}\right)=\int p(\tilde{y}\mid\theta)p\left(\theta\mid y_{1:n}\right)d\theta \]

where, \[ p(\tilde{y}\mid\theta)\sim N\left(\theta,\sigma^{2}\right) \]

and \(p\left(\theta\mid y_{1:n}\right)\) is same as (). Next, we can compute mean and variance of \(p\left(\tilde{y}\mid y_{1:n}\right)\). \[\begin{align*} \mathbb{E}\left[\tilde{y}\mid y_{1:n}\right] & =\mathbb{E}\left[\mathbb{E}\left[\tilde{y}\mid\theta,y_{1:n}\right]\mid y_{1:n}\right]\\ & =\mathbb{E}\left[\theta\mid y_{1:n}\right]\\ & =\mu_{n} \end{align*}\]

Also, \[\begin{align*} \mathrm{Var}\left[\tilde{y}|y_{1in}\right] & =\mathrm{Var}\left(\mathbb{E}\left[\tilde{y}\mid\theta,y_{1:n}\right]\mid y_{1:n}\right)+\mathbb{E}\left[\mathrm{Var}\left(\tilde{y}\mid\theta,y_{1:n}\right)\mid y_{1:n}\right]\\ & =\mathrm{Var}\left(\theta\mid y_{1:n}\right)+\sigma^{2}\\ & =\tau_{n}^{2}+\sigma^{2} \end{align*}\]

Therefore, \[ \tilde{y}\sim N\left(\mu_{n},\tau_{n}^{2}+\sigma^{2}\right) \]

   

3. \(\mathbf{Solution.}\qquad\) We can construct the interval for \(\theta\) by standardizing \(\theta\), \[ \mathbb{P}\left(z_{\frac{0.05}{2}}\leq\frac{\theta-\mu_{n}}{\tau_{n}}\leq z_{1-\frac{0.05}{2}}\right)=0.95 \]

then, \[ \mu_{n}-1.96\tau_{n}\leq\theta\leq\mu_{n}+1.96\tau_{n} \]

Similarly, the interval for \(\tilde{y}\) is, \[ \mu_{n}-1.96\sqrt{\tau_{n}^{2}+\sigma^{2}}\leq\tilde{y}\leq\mu_{n}+1.96\sqrt{\tau_{n}^{2}+\sigma^{2}} \]

Below we use the following R code to compute the 95% confidence intervals.

For \(n=10, \theta \in\) [138.49,192.24] and \(\tilde{y} \in\) [109.66,191.8].

For \(n=100, \theta \in\) [146.16,183.92] and \(\tilde{y} \in\) [110.68,189.47].

Nonconjugate single parameter model

Suppose you observe \(y=285\) from the model Binomial \((500, \theta),\) where \(\theta\) is an unknown parameter. Assume the prior on \(\theta\) has the following form \[ p(\theta)=\left\{\begin{array}{ll} {8 \theta,} & {0 \leq \theta<0.25} \\ {\frac{8}{3}-\frac{8}{3} \theta,} & {0.25 \leq \theta \leq 1} \\ {0,} & {\text { otherwise }} \end{array}\right. \]

  1. Compute the unnormalized posterior density function on a grid of \(m\) points for some large integer \(m .\) Using the grid approximation, compute and plot the normalized posterior density function \(p(\theta | y),\) as a function of \(\theta\)
  2. Sample 10000 draws of \(\theta\) from the posterior density and plot a histogram of the draws.

1. \(\mathbf{Solution.}\qquad\) Below we show R code to compute the unnormalized density posterior density function and we plot the normalized posterior density as a function of \(\theta\).

   

2. \(\mathbf{Solution.}\qquad\) Below we show R code for sampling \(\theta\) from the posterior density and plotting a histogram.