---
title: "jagshelper-vignette"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{jagshelper-vignette}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width=7,
fig.height=7
)
```
```{r setup}
library(jagshelper)
```
# jagshelper
The goal of `jagshelper` is to streamline Bayesian analysis in JAGS
using the `jagsUI` package.\
Functions are provided for extracting output in a simpler form,
assessing model convergence, and plotting model output. Also included is
a function giving a template model in JAGS syntax with the associated
`jagsUI` code.
Some of the `jagshelper` functionality is illustrated below, with steps
approximately corresponding to those of a typical Bayesian analysis
using JAGS.
## Model template
When starting a Bayesian analysis in JAGS from scratch, I can never
remember the exact structure for doing so, and sometimes need reminding
of basic JAGS model syntax.
The `skeleton()` function prints a JAGS model template to the screen,
along with code to simulate a corresponding dataset.
```{r}
library(jagshelper)
skeleton("EXAMPLE")
```
## Output summary & Assessing convergence
### A simple model
Having run a model for the first time, it can be useful to see how many
parameter nodes have been saved in total, or how many nodes exist for
each named parameter. This can aid in deciding the appropriate strategy
for assessing convergence: for example, whether trace plots can be
feasibly assessed for all parameter nodes in the model, or just a
subset.
In the example below, there are relatively few parameters saved and all
trace plots can be examined.
```{r}
nparam(asdf_jags_out) # how many parameters in total
nbyname(asdf_jags_out) # how many parameters (or dimensions) per parameter name
tracedens_jags(asdf_jags_out, parmfrow=c(3,3)) # trace plots for all parameters
check_Rhat(asdf_jags_out) # proportion of Rhats below a threshold of 1.1
asdf_jags_out$Rhat # Rhat values
```
### A more complex model
In the example below, there are many more parameters saved, and it is
perhaps more illustrative to examine the trace plots associated with the
least- converged parameter nodes, as measured by `Rhat` value (Gelman &
Rubin 1992).
```{r}
nparam(SS_out) # how many parameters in total
nbyname(SS_out) # how many parameters (or dimensions) per parameter name
traceworstRhat(SS_out, parmfrow=c(3,2)) # trace plots for least-converged nodes
check_Rhat(SS_out) # proportion of Rhats below a threshold of 1.1
```
```{r,fig.width=6,fig.height=5}
plotRhats(SS_out) # plotting Rhat values
```
### Additional functionality
- The function `check_neff()` behaves very similarly to
`check_Rhat()`, but makes comparisons based on `n.eff` (a crude
measure of effective sample size) rather than Gelman-Rubin
convergence diagnostic `Rhat`.
- The functions `traceworstRhat()` and `check_Rhat()` both contain an
optional `n.eff=` argument. When set to `TRUE`, the functions will
compare or plot based on the value of `n.eff` rather than `Rhat`.
- The function `tracedens_jags()` is likely to be the most useful and
concise trace plot version. However, if only line trace plots or
by-chain kernel density plots are desired (rather than both), older
versions `trace_jags()` and `chaindens_jags()` are preserved.
- The function `qq_postpred()` produces a quantile-quantile plot from
the posterior predictive distributions associated with a vector of
data. This can be visually interpreted in a similar manner to a
traditional Q-Q plot, with an appropriately-specified model
producing a plot that falls along the x=y line. While not intended
as an omnibus posterior predictive check, this plot might be useful
in detecting overparameterization, poor convergence, or a
mis-specified error model. It should be noted that this function
depends on the existence of a matrix of posterior predictive
samples, which is up to the user. This can be specified within JAGS,
or via appropriate simulation from the posterior samples.
- The function `ts_postpred()` provides a similar utility in detecting
possible overparameterization, as well as features in a dataset with
respect to the corresponding posterior predictive distributions.
It produces an envelope plot of the centered posterior predictive
distribution, defined as the difference between the posterior predictive
and the posterior predictive median. The centered time series is overlayed,
similarly defined as the difference between the time series and the
posterior predictive median.
```{r, fig.width=7, fig.height=4}
old_parmfrow <- par("mfrow") # storing previous graphics state
par(mfrow=c(1, 2))
qq_postpred(ypp=SS_out, p="ypp", y=SS_data$y)
ts_postpred(ypp=SS_out, p="ypp", y=SS_data$y)
par(mfrow=old_parmfrow) # restoring previous graphics state
```
- The function `plot_postpred()` is a wrapper that produces a sequence of
plots: first, an `envelope()` plot of the posterior predictive distribution overlayed
with raw data values, then a call to `ts_postpred()` giving a time series of the
posterior predictive residual distribution overlayed with the data residuals, then
a plot of the residual standard deviation calculated over a moving window. By
default, these three plots are given with respect to the data sequence, then with respect to the data supplied for x, then with respect to fitted values.
- The function `pairstrace_jags()` gives methods for plotting
two-dimensional trace plots, scatter plots, or contour plots, in
which each possible pairing of parameter nodes are plotted with
respect to one another. In addition to convergence, this may provide
a graphical check for correlation between parameter nodes, or
problematic posterior surface shapes. An example is shown below.
```{r, fig.width=6}
pairstrace_jags(asdf_jags_out, p=c("a","sig_a"), points=TRUE, parmfrow=c(3,2))
```
- The functions `cor_jags()` and `plotcor_jags()` respectively return
and plot correlation matrices of all or a subset of parameters,
which may be useful in directly assessing correlation between
parameters. In the case of multiple nodes per parameter (as in a
vector or array of nodes), the full collection of nodes per
parameter name is given one axis tick. This is intended to reduce
graphical clutter as well as giving greater visual weight to single
parameters. An example is given below.
```{r, fig.width=7, fig.height=6}
plotcor_jags(SS_out, p=c("trend","rate","sig"))
```
## Extracting output as data.frame
The `jags_df()` function extracts all posterior samples from an output
object returned by `jagsUI::jags()` as a `data.frame`, which may be
preferable for some users.
```{r}
out_df <- jags_df(asdf_jags_out)
str(out_df)
```
### Additional functionality
- Functions for trace plots and by-chain kernel density are available
with `trace_df()` and `chaindens_df()`, respectively, if model
output is only saved in `data.frame` form. Note that both functions
contain an additional `nline=` argument corresponding to the number
of MCMC chains that were run.
- Functions `trace_line()` and `chaindens_line()` are also included
for completeness, which take a single vector of MCMC iterations
(associated with a single parameter) as input. Note that both
functions contain an additional `nline=` argument corresponding to
the number of MCMC chains that were run.
## Plotting a matrix associated with a vector of parameter nodes
### A single matrix
The `caterpillar()` and `envelope()` functions plot output associated
with a vector of parameter nodes. This is often expressed as a
two-dimensional matrix or `data.frame`, with a column for each parameter
node and a row for each MCMC iteration. Both `caterpillar()` and
`envelope()` were originally written to accept such `data.frame`s as
inputs, but now also accept a `jagsUI` output object and parameter name.
It is anticipated that `caterpillar()` could be used for effect sizes
associated with a categorical variable, in which plotting order may or
may not matter.
By contrast, `envelope()` is intended for a matrix associated with a
sequence of parameter nodes, such as in a time series.
For a simpler case, `plotdens()` produces a kernel density plot of a
single parameter node, or overlays multiple parameter nodes from a list.
Alternately (shown below) it overlays kernel density plots of a vector
of parameter nodes.
```{r,fig.width=5,fig.height=8}
old_parmfrow <- par("mfrow") # storing old graphics state
par(mfrow=c(3,1))
caterpillar(asdf_jags_out, "a")
envelope(SS_out, "trend", x=SS_data$x)
plotdens(asdf_jags_out, "a")
par(mfrow=old_parmfrow) # resetting graphics state
```
### Multiple matrices or multiple models
It may be appropriate to make by-element comparisons of multiple such
matrices, perhaps between multiple candidate models.
Function `comparecat()` produces interleaved caterpillar plots for a
list of `jagsUI` output objects and an optional list of parameters,
plotting parameters common to a set of models adjacent to one another.
The example below uses the same output object three times, but will show
functionality.
Function `comparedens()` behaves similarly, but produces left- and
right-facing kernel density plots for TWO `jagsUI` output objects and an
optional list of parameters. The example below uses the same output
object twice, but will show functionality.
```{r,fig.width=5,fig.height=8}
old_parmfrow <- par("mfrow") # storing old graphics state
par(mfrow=c(2,1))
comparecat(x=list(asdf_jags_out, asdf_jags_out, asdf_jags_out),
p=c("a","b","sig"))
comparedens(x1=asdf_jags_out, x2=asdf_jags_out, p=c("a","b","sig"))
par(mfrow=old_parmfrow) # resetting graphics state
```
Function `overlayenvelope()` will automatically overlay multiple
`envelope()` plots, and may be used with a variety of input structures:
- A `list()` of 2-dimensional posterior `data.frame`s or matrices
- A 3-dimensional `array`, in which multiple 2-dimensional posterior
matrices are joined along the third dimension
- A `list()` of `jagsUI` output objects, plus a parameter name
- A single `jagsUI` output objects, plus a vector of parameter names
```{r}
par(mfrow=c(2,2))
## usage with list of input data.frames
overlayenvelope(df=list(SS_out$sims.list$cycle_s[,,1],
SS_out$sims.list$cycle_s[,,2]))
## usage with a 3-d input array
overlayenvelope(df=SS_out$sims.list$cycle_s)
## usage with a jagsUI output object and parameter name (2-d parameter)
overlayenvelope(df=SS_out, p="cycle_s")
## usage with a single jagsUI output object and multiple parameters
overlayenvelope(df=SS_out, p=c("trend","rate"))
```
Function `crossplot()` plots corresponding pairs of parameter densities on the X-
and Y-axes. Three plotting methods are provided, that may be overlayed if desired:
- If drawcross == TRUE, caterpillar-like plots will be produced, with quantile intervals in the x- and y- directions.
- If drawx == TRUE, caterpillar-like plots will be produced, but rotated along the standardized principal component axes. This may be useful to draw if correlation is present.
- If drawblob == TRUE, smoothed polygons will be produced, each containing approximately ci= x100% of the associated MCMC samples.
This function may be used with vectors or matrices of MCMC samples, or with a `jagsUI`
object and a vector of parameter names.
```{r}
## Usage with single vectors (or data.frames or 2d matrices)
xx <- SS_out$sims.list$trend[,41]
yy <- SS_out$sims.list$cycle[,41]
## Showing possible geometries
par(mfrow = c(2, 2))
plot(xx, yy, col=adjustcolor(1, alpha.f=.1), pch=16, main="Cross Geometry")
crossplot(xx, yy, add=TRUE, col=1)
plot(xx, yy, col=adjustcolor(1, alpha.f=.1), pch=16, main="G Geometry")
crossplot(xx, yy, add=TRUE, col=1,
drawcross=FALSE, drawx=TRUE)
plot(xx, yy, col=adjustcolor(1, alpha.f=.1), pch=16, main="Blob Geometry")
crossplot(xx, yy, add=TRUE, col=1,
drawcross=FALSE, drawblob=TRUE)
plot(xx, yy, col=adjustcolor(1, alpha.f=.1), pch=16, main="Blob Outlines")
crossplot(xx, yy, add=TRUE, col=1,
drawcross=FALSE, drawblob=TRUE, outline=TRUE)
## Usage with jagsUI object and parameter names, plus addl functionality
par(mfrow = c(1, 1))
crossplot(SS_out, p=c("trend","cycle"),
labels=SS_data$x, labelpos=1, link=TRUE, drawblob=TRUE,
col="random")
```
### Comparison between priors and posteriors
Function `comparepriors()` is a wrapper for `comparedens()`, and plots side-by-side
kernel densities of all parameters with names ending in `"_prior"`, along with
the respective posterior densities. It should be noted that additional parameters
must be included in the JAGS model to provide samples of the prior distributions,
as is shown in the example below.
```{r, eval=FALSE}
...
sig ~ dunif(0, 10) # this is the parameter that is used elsewhere in the model
sig_prior ~ dunif(0, 10) # this is only used to give samples of the prior
...
```
```{r}
comparepriors(asdf_prior_jags_out, parmfrow=c(2,3))
```