STATS 531: Homework 6

Installing the pomp package

Computation time is an unavoidable consideration when working with simulation-based inference, for all but small datasets and simple models.
The pomp package therefore allows you to specify the most computationally intensive steps—usually, simulation of the stochastic dynamic system, and evaluation of the measurement density—as snippets of C code.
Consequently, to use pomp, your R program must have access to a C compiler. In addition, pomp takes advantage of some Fortran code and therefore requires a Fortran compiler.
Installing the necessary compilers should be fairly routine, but does involve an extra step beyond the usual installation of an R package, unless you are running the Linux operating system for which they are usually installed by default. Given how fundamental C and Fortran are to scientific computing, it is unfortunate that Mac and Windows do not provide these compilers by default.
Detailed instructions for installing pomp and other software that we will use with it are provided in the following places:
- The pomp website installation page
- Additional instructions on our course website

Homework questions

Please submit your solutions to Canvas as an Rmarkdown (.Rmd) file which the GSIs will compile into an HTML document. Your Rmd file can read in the Parus major data from the internet, e.g., by

dat <- read.csv("https://ionides.github.io/531w20/10/parus.csv")

Note: you will be using pomp version 2.x. There are some differences from pomp 1.x that may require attention if and when you look on the internet for pomp code.

Question 6.1 Reformulating the Ricker model

The Ricker equation can be reparameterized so that the scaling of \(P_n\) is explicit: \[ P_{n+1} = r\,P_{n}\,\exp\left(-\frac{P_{n}}{k}\right). \]

Modify the pomp object created in the notes to reflect this reparameterization. Also, Modify the measurement model so that the data \(y_n\) is modeled as \[ Y_n \mid P_n \sim \textrm{Negbin}(\phi\,P_n,\psi). \]

Here, \(\mathrm{Negbin}(\mu,\psi)\) is the negative binomial distribution with mean \(\mu\) and probability parameter \(\psi\), and therefore variance \(\mu/\psi\). This parameterization corresponds in R to rbinom(...,mu,prob). See ?rnbinom for documentation on the negative binomial distribution and the R Extensions Manual section on distribution functions for information on how to access these in C.

Try simulating from a few choices of the parameters, and present one simulation from a set of parameters that shows oscillatory behavior.

\(\mathbf{Solution.}\qquad\) First we show a time-series plot of the Great Tit population. We see that the population shows cyclical behavior over time.

# Question 6.1 -----------------------------------------------------------
dat = read.csv("https://ionides.github.io/531w20/10/parus.csv")
set.seed(531)

# Build trajectory of deterministic process
skel = Csnippet("DN=r*N*exp(-N/k);")

parus = pomp(
  data = dat,
  times="year", 
  t0=1959,
  skeleton = map(skel),
  paramnames=c("r","k"),
  statenames=c("N")
)

# Plot the data
ggplot(dat, aes(x = year, y = pop)) + 
  geom_line(colour = "steelblue") + 
  ggtitle("Annual Great Tit Population")

Figure 1: Annual Great Tit Population.

As in lecture 10 notes, we plot the trajectory of the deterministic skeleton. With \(k=1\), we have the same model as in the notes.

traj = trajectory(parus, params=c(N.0=1,r=12,k=1), format = "data.frame")
ggplot(data = traj, aes(x = year, y = N)) + 
  geom_line(colour = "steelblue") + 
  ylab(TeX("$P_n$")) + 
  ggtitle("Trajectory of Determnistic Process")

Figure 2: Trajectory of Determnistic Process.

Next we add the stochastic Ricker model to parus. Below we plot one possible realization of the process.

# Simulate the stochastic Ricker Model
stochStep = Csnippet("
                     e = rnorm(0, sigma); 
                     N = r*N*exp(-N/k + e);"
                     )
parus = pomp(
  data = dat,
  times="year", 
  t0 = 1959,
  skeleton = map(skel),
  rprocess=discrete_time(step.fun=stochStep, delta.t=1),
  paramnames=c("r","k","sigma"),
  statenames=c("N","e")
)

sim = simulate(parus, params = c(N.0=dat$pop[1], e.0=0, r=6, k=100, sigma=0.5),
                format = c("data.frame"))

ggplot(data = sim, aes(x = year, y = N)) + 
  geom_line(colour = "steelblue") + 
  ylab(TeX("$P_n$")) + 
  ggtitle("Simulating Ricker Equation") + 
  geom_point(shape = 1)

Figure 3: Simulating Ricker Equation.

Finally, we complete the specification of the POMP by specifying the measurement model.

# Add measurement model
rmeas = Csnippet("pop = rnbinom(phi*N*psi / (1-psi), psi);")
dmeas = Csnippet("lik = dnbinom(pop, phi*N*psi / (1-psi), psi, give_log);")

parus = pomp(data = dat,
             times="year", 
             t0=1959,
             skeleton = map(skel),
             rprocess=discrete_time(step.fun=stochStep, delta.t=1),
             rmeasure = rmeas,
             dmeasure = dmeas,
             statenames = c("N","e"),
             paramnames = c("r","k","sigma","phi","psi"),
             params = c(N.0= dat$pop[1], e.0=0, r=6, k=100, sigma=0.05, 
                        phi=1, psi= mean(dat$pop) / var(dat$pop)))

sims = simulate(parus, nsim=3, format = c("data.frame"), include.data=TRUE)
ggplot(data = sims, mapping = aes(x = year, y = pop)) + 
  geom_line(colour = "steelblue") + 
  facet_wrap(~.id)

Figure 4: Plot of simulations.

We set the initial value of N.0 equal to the initial value of the population. Next, we set \(\phi=1\) so that the mean of \(Y_n\) is equal to mean of \(P_n\). We determine \(\psi\) by setting it equal to the \(\frac{\mu_{pop}}{\sigma^2_{pop}}\). Finally, \(r\) and \(\sigma\) seem to control the volatility, so we set \(r=6\) and \(\sigma=0.05\), while \(k=100\) alters the level of the population.

Question 6.2 Coding a new POMP model

Construct a pomp object for the Parus major data modeled using the stochastic Beverton-Holt model, \[ P_{n+1} = \frac{a\,P_n}{1+b\,P_n}\,\varepsilon_n, \] where \(a\) and \(b\) are parameters and \[ \varepsilon_t \sim \mathrm{Lognormal}(-\tfrac{1}{2}\sigma^2,\sigma^2). \] Assume the same measurement model as we used for the Ricker model. Try simulating from a few choices of the parameters. What are the similarities and differences between simulations you obtain from the Beverton-Holt model and those from the Ricker model? Present one simulation to support your comments.

\(\mathbf{Solution.}\qquad\) Below we construct pomp object using the stochastic Beverton-Holt model and simulate one possible realization.

# Simulate the stochastic Ricker Model
stochStep <- Csnippet("e = rlnorm((-1)*sigma*sigma/2, sigma);
                      N = (a*N/(1+b*N))*e;")
parus = pomp(data = dat,
             times="year", 
             t0=1959,
             rprocess=discrete_time(step.fun=stochStep, delta.t=1),
             paramnames = c("a","b","sigma"),
             statenames = c("N","e"))


sim = simulate(parus,
               params=c(N.0=dat$pop[1], e.0=1, a=6, b=.03, sigma=0.01),
               format = "data.frame")

ggplot(data = sim, aes(x = year, y = N)) + 
  geom_line(colour = "steelblue") + 
  ylab(TeX("$P_n$")) + 
  ggtitle("Simulating Beverton-Holt Model") + 
  geom_point(shape = 1)

Figure 5: Simulating Beverton-Holt Model.

Finally we add the measurment model.

# Add measurement model
rmeas <- Csnippet("pop = rnbinom(phi*N*psi/(1-psi),psi);")
dmeas <- Csnippet("lik = dnbinom(pop,phi*N*psi/(1-psi),psi,give_log);")

parus = pomp(data = dat,
             times="year", 
             t0=1959,
             rprocess=discrete_time(step.fun=stochStep, delta.t=1),
             rmeasure=rmeas,
             dmeasure=dmeas,
             statenames=c("N", "e"),
             paramnames=c("a", "b", "sigma","phi", "psi"),
             params = c(N.0=dat$pop[1], e.0=0, a=6, b=0.03,sigma=0.01, 
                        phi=1, psi=mean(dat$pop) / var(dat$pop)))

sims = simulate(parus, nsim=3, format = "data.frame", include.data=TRUE)
ggplot(data=sims, mapping=aes(x=year,y=pop)) + 
  geom_line(colour = "steelblue") +
  facet_wrap(~.id)

For both models we try initialize N.0, \(\phi\) and \(\psi\) in the same manner. However, we need to tune \(a\) and \(b\) differently so that the level of the population is estimated correctly.

Question 6.3 This feedback question is worth credit

Explain which parts of your responses above made use of a source, meaning anything or anyone you consulted (including classmates or office hours) to help you write or check your answers. All sources are permitted. To encourage responsible use of these sources while maintaining class integrity, we require a response to this question, even if this may occasionally just say that you worked out everything entirely by yourself. See the syllabus for additional information on grading.
How long did this homework take? Report on any technical difficulties that arose.

This homework is conceptually quite simple, but involves overcoming various technical hurdles. The hurdles may be overcome quite quickly, or could turn into a longer battle.
To make progress on statistical inference for POMP models, we have to solve these underlying computational issues.
If you get stuck, ask for help from your peers and/or the GSIs and/or me. Please report how much time this homework ends up taking, to help me monitor how many difficulties are encountered.

I used lecture 10 notes as a source, and the user guide for pomp2 package.
The homework took me about 5 hours, by reading through the lecture notes and pomp2 guide, fixing coding issues, and typing up the homework. I noticed that pomp package does not work well if files are opened from Google Drive File Stream. In other words, I had to move the homework documents to a local folder on my computer and run them from there.

Acknowledgements

The questions derive from material in a short course on Simulation-based Inference for Epidemiological Dynamics

Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.