The **disaggregation** package contains functions to run
Bayesian disaggregation models. Aggregated response data over large
heterogeneous regions can be used alongside fine-scale covariate
information to predict fine-scale response across the region. The github
page for this package can be found here.

Install **disaggregation** using:

or from github using

The key functions are `prepare_data`

,
`fit_model`

and `predict`

. The
`prepare_data`

function takes the aggregated data and
covariate data to be used in the model and produces an object to be use
by `fit_model`

. This functions runs the disaggregation model
and the out can be passed to `predict`

to produce fine-scale
predicted maps of the response variable.

To use the disaggregation `prepare_data`

function, you
must have the aggregated data as a `sf`

object and a
`SpatRaster`

of the covariate data to be used in the
model.

We will demonstrate an example of the **disaggregation**
package using areal data of leukemia incidence in New York, using data
from the package `SpatialEpi`

.

```
library(SpatialEpi, quietly = TRUE)
library(dplyr, quietly = TRUE)
library(disaggregation, quietly = TRUE)
library(ggplot2)
library(sf)
library(terra)
polygons <- sf::st_as_sf(NYleukemia$spatial.polygon)
df <- cbind(polygons, NYleukemia$data)
ggplot() + geom_sf(data = df, aes(fill = cases / population)) + scale_fill_viridis_c(lim = c(0, 0.003))
```

Now we simulate two covariate rasters for the area of interest and
make a two-layered `SpatRaster`

. They are simulated at the
resolution of approximately 1km^{2}.

```
bbox <- sf::st_bbox(df)
extent_in_km <- 111*(bbox[c(3, 4)] - bbox[c(1, 2)])
n_pixels_x <- floor(extent_in_km[[1]])
n_pixels_y <- floor(extent_in_km[[2]])
r <- terra::rast(ncols = n_pixels_x, nrows = n_pixels_y)
terra::ext(r) <- terra::ext(df)
data_generate <- function(x){
rnorm(1, ifelse(x %% n_pixels_x != 0, x %% n_pixels_x, n_pixels_x), 3)
}
terra::values(r) <- sapply(seq(terra::ncell(r)), data_generate)
r2 <- terra::rast(ncol = n_pixels_x, nrow = n_pixels_y)
terra::ext(r2) <- terra::ext(df)
terra::values(r2) <- sapply(seq(terra::ncell(r2)),
function(x) rnorm(1, ceiling(x/n_pixels_y), 3))
cov_stack <- terra::rast(list(r, r2))
cov_stack <- terra::scale(cov_stack)
names(cov_stack) <- c('layer1', 'layer2')
```

We also create a population raster. This is to allow the model to correctly aggregated the pixel values to the polygon level. For this simple example we assume that the population within each polygon is uniformly distributed.

```
extracted <- terra::extract(r, terra::vect(df$geometry), fun = sum)
n_cells <- terra::extract(r, terra::vect(df$geometry), fun = length)
df$pop_per_cell <- df$population/n_cells$lyr.1
pop_raster <- terra::rasterize(terra::vect(df), cov_stack, field = 'pop_per_cell')
```

To correct small inconsistencies in the polygon geometry, we run the code below.

Now we have setup the data we can use the `prepare_data`

function to create the objects needed to run the disaggregation model.
The name of the response variable and id variable in the `sf`

object should be specified.

The user can also control the parameters of the mesh that is used to
create the spatial field. The mesh is created by finding a tight
boundary around the polygon data, and creating a fine mesh within the
boundary and a coarser mesh outside. This speeds up computation time by
only having a very fine mesh within the area of interest and having a
small region outside with a coarser mesh to avoid edge effects. The mesh
parameters: `concave`

, `convex`

and
`resolution`

refer to the parameters used to create the mesh
boundary using the fm_nonconvex_hull_inla
function, while the mesh parameters `max.edge`

,
`cut`

and `offset`

refer to the parameters used to
create the mesh using the fm_mesh_2d
function.

```
data_for_model <- prepare_data(polygon_shapefile = df,
covariate_rasters = cov_stack,
aggregation_raster = pop_raster,
response_var = 'cases',
id_var = 'censustract.FIPS',
mesh.args = list(cut = 0.01,
offset = c(0.1, 0.5),
max.edge = c(0.1, 0.2),
resolution = 250),
na.action = TRUE)
```

Now we have our data object we are ready to run the model. Here we can specify the likelihood function as Gaussian, binomial or poisson, and we can specify the link function as logit, log or identity. The disaggregation model makes predictions at the pixel level:

\(link(pred_i) = \beta_0 + \beta X + GP(s_i) + u_i\)

where \(X\) are the covariates, \(GP\) is the Gaussian random field and \(u_i\) is the iid random effect. The pixel predictions are then aggregated to the polygon level using the weighted sum (via the aggregation raster, \(agg_i\)):

\(cases_j = \sum_{i \epsilon j} pred_i \times agg_i\)

\(rate_j = \frac{\sum_{i \epsilon j} pred_i \times agg_i}{\sum_{i \epsilon j} agg_i}\)

The different likelihood correspond to slightly different models (\(y_j\) is the response count data):

**Gaussian** (\(\sigma_j\) is the dispersion of the polygon
data),

\(dnorm(y_j/\sum agg_i, rate_j, \sigma_j)\)

Here \(\sigma_j = \sigma \sqrt{\sum agg_i^2} / \sum agg_i\), where \(\sigma\) is the dispersion of the pixel data, a parameter learnt by the model.

**Binomial** (For a survey in polygon j, \(y_j\) is the number positive and \(N_j\) is the number tested)

\(dbinom(y_j, N_j, rate_j)\)

**Poisson** (predicts incidence count)

\(dpois(y_j, cases_j)\)

The user can also specify the priors for the regression parameters. For the field, the user specifies the pc priors for the range, \(\rho_{min}\) and \(\rho_{prob}\), where \(P(\rho < \rho_{min}) = \rho_{prob}\), and the variation, \(\sigma_{min}\) and \(\sigma_{prob}\), where \(P(\sigma > \sigma_{min}) = \sigma_{prob}\), in the field. For the iid effect, the user also specifies pc priors.

By default the model contains a spatial field and a polygon iid
effect. These can be turned off in the `disag_model`

function, using `field = FALSE`

or
`iid = FALSE`

.

```
model_result <- disag_model(data_for_model,
iterations = 1000,
family = 'poisson',
link = 'log',
priors = list(priormean_intercept = 0,
priorsd_intercept = 2,
priormean_slope = 0.0,
priorsd_slope = 0.4,
prior_rho_min = 3,
prior_rho_prob = 0.01,
prior_sigma_max = 1,
prior_sigma_prob = 0.01,
prior_iideffect_sd_max = 0.05,
prior_iideffect_sd_prob = 0.01))
#> Fitting model. This may be slow.
```

Now we have the results from the model of the fitted parameters, we
can predict Leukemia incidence rate at fine-scale (the scale of the
covariate data) across New York. The `predict`

function takes
the model result and predicts both the mean raster surface and predicts
and summarises `N`

parameter draws, where `N`

is
set by the user (default 100). The uncertainty is summarised via the
confidence interval set by the user (default 95% CI).