Computation time is an unavoidable consideration when working with simulation-based inference, for all but small datasets and simple models.
The pomp package therefore allows you to specify the most computationally intensive steps—usually, simulation of the stochastic dynamic system, and evaluation of the measurement density—as snippets of C code.
Consequently, to use pomp, your R program must have access to a C compiler. In addition, pomp takes advantage of some Fortran code and therefore requires a Fortran compiler.
Installing the necessary compilers should be fairly routine, but does involve an extra step beyond the usual installation of an R package, unless you are running the Linux operating system for which they are usually installed by default. Given how fundamental C and Fortran are to scientific computing, it is unfortunate that Mac and Windows do not provide these compilers by default.
Detailed instructions for installing pomp and other software that we will use with it are provided in the following places:
Additional instructions on our course website
Please submit your solutions to Canvas as an Rmarkdown (.Rmd) file which the GSIs will compile into an HTML document. Your Rmd file can read in the Parus major data from the internet, e.g., by
dat <- read.csv("https://ionides.github.io/531w20/10/parus.csv")
Note: you will be using pomp version 2.x. There are some differences from pomp 1.x that may require attention if and when you look on the internet for pomp code.
The Ricker equation can be reparameterized so that the scaling of \(P_n\) is explicit: \[ P_{n+1} = r\,P_{n}\,\exp\left(-\frac{P_{n}}{k}\right). \]
Modify the pomp
object created in the notes to reflect this reparameterization. Also, Modify the measurement model so that the data \(y_n\) is modeled as
\[
Y_n \mid P_n \sim \textrm{Negbin}(\phi\,P_n,\psi).
\]
Here, \(\mathrm{Negbin}(\mu,\psi)\) is the negative binomial distribution with mean \(\mu\) and probability parameter \(\psi\), and therefore variance \(\mu/\psi\). This parameterization corresponds in R to rbinom(...,mu,prob)
. See ?rnbinom
for documentation on the negative binomial distribution and the R Extensions Manual section on distribution functions for information on how to access these in C.
Try simulating from a few choices of the parameters, and present one simulation from a set of parameters that shows oscillatory behavior.
\(\mathbf{Solution.}\qquad\) First we show a time-series plot of the Great Tit population. We see that the population shows cyclical behavior over time.
# Question 6.1 -----------------------------------------------------------
dat = read.csv("https://ionides.github.io/531w20/10/parus.csv")
set.seed(531)
# Build trajectory of deterministic process
skel = Csnippet("DN=r*N*exp(-N/k);")
parus = pomp(
data = dat,
times="year",
t0=1959,
skeleton = map(skel),
paramnames=c("r","k"),
statenames=c("N")
)
# Plot the data
ggplot(dat, aes(x = year, y = pop)) +
geom_line(colour = "steelblue") +
ggtitle("Annual Great Tit Population")
As in lecture 10 notes, we plot the trajectory of the deterministic skeleton. With \(k=1\), we have the same model as in the notes.
traj = trajectory(parus, params=c(N.0=1,r=12,k=1), format = "data.frame")
ggplot(data = traj, aes(x = year, y = N)) +
geom_line(colour = "steelblue") +
ylab(TeX("$P_n$")) +
ggtitle("Trajectory of Determnistic Process")
Next we add the stochastic Ricker model to parus. Below we plot one possible realization of the process.
# Simulate the stochastic Ricker Model
stochStep = Csnippet("
e = rnorm(0, sigma);
N = r*N*exp(-N/k + e);"
)
parus = pomp(
data = dat,
times="year",
t0 = 1959,
skeleton = map(skel),
rprocess=discrete_time(step.fun=stochStep, delta.t=1),
paramnames=c("r","k","sigma"),
statenames=c("N","e")
)
sim = simulate(parus, params = c(N.0=dat$pop[1], e.0=0, r=6, k=100, sigma=0.5),
format = c("data.frame"))
ggplot(data = sim, aes(x = year, y = N)) +
geom_line(colour = "steelblue") +
ylab(TeX("$P_n$")) +
ggtitle("Simulating Ricker Equation") +
geom_point(shape = 1)
Finally, we complete the specification of the POMP by specifying the measurement model.
# Add measurement model
rmeas = Csnippet("pop = rnbinom(phi*N*psi / (1-psi), psi);")
dmeas = Csnippet("lik = dnbinom(pop, phi*N*psi / (1-psi), psi, give_log);")
parus = pomp(data = dat,
times="year",
t0=1959,
skeleton = map(skel),
rprocess=discrete_time(step.fun=stochStep, delta.t=1),
rmeasure = rmeas,
dmeasure = dmeas,
statenames = c("N","e"),
paramnames = c("r","k","sigma","phi","psi"),
params = c(N.0= dat$pop[1], e.0=0, r=6, k=100, sigma=0.05,
phi=1, psi= mean(dat$pop) / var(dat$pop)))
sims = simulate(parus, nsim=3, format = c("data.frame"), include.data=TRUE)
ggplot(data = sims, mapping = aes(x = year, y = pop)) +
geom_line(colour = "steelblue") +
facet_wrap(~.id)
We set the initial value of N.0
equal to the initial value of the population. Next, we set \(\phi=1\) so that the mean of \(Y_n\) is equal to mean of \(P_n\). We determine \(\psi\) by setting it equal to the \(\frac{\mu_{pop}}{\sigma^2_{pop}}\). Finally, \(r\) and \(\sigma\) seem to control the volatility, so we set \(r=6\) and \(\sigma=0.05\), while \(k=100\) alters the level of the population.
Construct a pomp object for the Parus major data modeled using the stochastic Beverton-Holt model, \[ P_{n+1} = \frac{a\,P_n}{1+b\,P_n}\,\varepsilon_n, \] where \(a\) and \(b\) are parameters and \[ \varepsilon_t \sim \mathrm{Lognormal}(-\tfrac{1}{2}\sigma^2,\sigma^2). \] Assume the same measurement model as we used for the Ricker model. Try simulating from a few choices of the parameters. What are the similarities and differences between simulations you obtain from the Beverton-Holt model and those from the Ricker model? Present one simulation to support your comments.
\(\mathbf{Solution.}\qquad\) Below we construct pomp object using the stochastic Beverton-Holt model and simulate one possible realization.
# Simulate the stochastic Ricker Model
stochStep <- Csnippet("e = rlnorm((-1)*sigma*sigma/2, sigma);
N = (a*N/(1+b*N))*e;")
parus = pomp(data = dat,
times="year",
t0=1959,
rprocess=discrete_time(step.fun=stochStep, delta.t=1),
paramnames = c("a","b","sigma"),
statenames = c("N","e"))
sim = simulate(parus,
params=c(N.0=dat$pop[1], e.0=1, a=6, b=.03, sigma=0.01),
format = "data.frame")
ggplot(data = sim, aes(x = year, y = N)) +
geom_line(colour = "steelblue") +
ylab(TeX("$P_n$")) +
ggtitle("Simulating Beverton-Holt Model") +
geom_point(shape = 1)
Finally we add the measurment model.
# Add measurement model
rmeas <- Csnippet("pop = rnbinom(phi*N*psi/(1-psi),psi);")
dmeas <- Csnippet("lik = dnbinom(pop,phi*N*psi/(1-psi),psi,give_log);")
parus = pomp(data = dat,
times="year",
t0=1959,
rprocess=discrete_time(step.fun=stochStep, delta.t=1),
rmeasure=rmeas,
dmeasure=dmeas,
statenames=c("N", "e"),
paramnames=c("a", "b", "sigma","phi", "psi"),
params = c(N.0=dat$pop[1], e.0=0, a=6, b=0.03,sigma=0.01,
phi=1, psi=mean(dat$pop) / var(dat$pop)))
sims = simulate(parus, nsim=3, format = "data.frame", include.data=TRUE)
ggplot(data=sims, mapping=aes(x=year,y=pop)) +
geom_line(colour = "steelblue") +
facet_wrap(~.id)
For both models we try initialize N.0, \(\phi\) and \(\psi\) in the same manner. However, we need to tune \(a\) and \(b\) differently so that the level of the population is estimated correctly.
This homework is conceptually quite simple, but involves overcoming various technical hurdles. The hurdles may be overcome quite quickly, or could turn into a longer battle.
To make progress on statistical inference for POMP models, we have to solve these underlying computational issues.
If you get stuck, ask for help from your peers and/or the GSIs and/or me. Please report how much time this homework ends up taking, to help me monitor how many difficulties are encountered.
The questions derive from material in a short course on Simulation-based Inference for Epidemiological Dynamics
Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.