Estimating disease severity while correcting for reporting delays

Understanding disease severity, and especially the case fatality risk (CFR), is key to outbreak response. During an outbreak there is often a delay between cases being reported, and the outcomes (for CFR, deaths) of those cases being known. Simply dividing total deaths to date by total cases to date may lead to an underestimate of the CFR rate in real-time, because many cases have outcomes that are not yet known.

Knowing the distribution of these delays from previous outbreaks of the same (or similar) diseases, and accounting for them, can therefore help ensure less biased estimates of disease severity. See the Concept section at the end of this vignette for more on how reporting delays bias CFR estimates.

The severity of a disease can be estimated while correcting for delays in reporting using methods outlines in Nishiura et al. (2009), and which are implemented in the cfr package.

Use case

A disease outbreak is underway. We want to know how severe the disease is in terms of the case fatality risk (CFR), but there is a delay between cases being reported, and the outcomes of those cases — whether recovery or death — being known. This is the reporting delay, and can be accounted for by knowing the reporting delay from past outbreaks.

What we have

What we assume

First we load the cfr package.

# load cfr

Case and death data

Data on cases and deaths may be obtained from a number of publicly accessible sources, such as the global Covid-19 dataset curated by Our World in Data, a similar dataset made available through the R package covidregionaldata (Palmer et al. 2021), or data on outbreaks of other infections made available in outbreaks.

In an outbreak response scenario, such data may also be compiled and shared locally. See the vignette on working with data from incidence2 on working with a common format of incidence data which can help interoperability with other formats.

The cfr package requires only a data frame with three columns, “date”, “cases”, and “deaths”, giving the daily number of reported cases and deaths.

Here, we use some data from the first Ebola outbreak, in the Democratic Republic of the Congo in 1976, that is included with this package (Camacho et al. 2014).


# view ebola dataset
#>         date cases deaths
#> 1 1976-08-25     1      0
#> 2 1976-08-26     0      0
#> 3 1976-08-27     0      0
#> 4 1976-08-28     0      0
#> 5 1976-08-29     0      0
#> 6 1976-08-30     0      0

Obtaining data on reporting delays

We obtain the disease’s onset-to-death distribution from a more recent Ebola outbreak, reported in Barry et al. (2018). The onset-to-death distribution is considered to be Gamma distributed, with a shape \(k\) = 2.40 and a scale of \(\theta\) = 3.33.

Note that while we use a continuous distribution here, it is more appropriate to use a discrete distribution instead as we are working with daily data.

Note also that we use the central estimates for each distribution parameter, and by ignoring uncertainty in these parameters the uncertainty in the resulting CFR is likely to be underestimated.

The forthcoming epiparameter package aims to be a library of epidemiological delay distributions, which can be accessed easily from within workflows. See the vignette on using delay distributions for more information on how to use this and other distribution objects supported by R to prepare delay density functions.

Estimate disease severity

We use the function cfr_static() to calculate overall disease severity at the latest date of the outbreak.

  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
#> Total cases = 245 and p = 0.959: using Normal approximation to binomial likelihood.
#>   severity_estimate severity_low severity_high
#> 1            0.9742       0.8356        0.9877

The cfr_static() function is well suited to small outbreaks where there are relatively few events and the time period under consideration if relatively brief, so the severity is unlikely to have changed over time.

To understand how severity has changed over time (e.g. following vaccination or pathogen evolution), use the function cfr_time_varying(). This function is however not well suited to small outbreaks because it requires sufficiently many cases over time to estimate how CFR changes. More on this can be found on the vignette on estimating how disease severity varies over the course of an outbreak.

Estimate ascertainment ratio

It is important to know what proportion of cases in an outbreak are being ascertained to muster the appropriate response, and to estimate the overall burden of the outbreak.

Note that the ascertainment ratio may be affected by a number of factors. When the main factor in low ascertainment is the lack of (access to) testing capacity, we refer to this as reporting or under-reporting.

The estimate_ascertainment() function estimates the ascertainment ratio using daily case and death data, the known severity of the disease from previous outbreaks, and optionally a delay distribution of onset-to-death.

Here, we estimate reporting in the 1976 Ebola outbreak in the Congo, assuming that Ebola virus disease (at that time) had a baseline severity of about 0.7 (70% of cases result in deaths), based on CFR values estimated in later, larger datasets. We use the onset-to-death distribution from Barry et al. (2018).

# estimate reporting with a baseline severity of 70%
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33),
  severity_baseline = 0.7
#> Total cases = 245 and p = 0.959: using Normal approximation to binomial likelihood.
#>   ascertainment_estimate ascertainment_low ascertainment_high
#> 1              0.7185383         0.7087172          0.8377214

This analysis suggests that between 70% and 83% of cases were reported in this outbreak.

More details can be found in the vignette on estimating the proportion of cases that are reported during an outbreak.

Concept: How reporting delays bias CFR estimates

Simply dividing the number of deaths by the number of cases would obtain a CFR that is a naive estimator of the true CFR.

Suppose 10 people start showing symptoms of a disease on a given day and the end of that day all remain alive. Suppose that for the next 5 days, the numbers of new cases continue to rise until they reach 100 new cases on day 5. However, suppose that by day 5, all infected individuals remain alive.

The naive estimate of the CFR calculated at the end of the first 5 days would be zero, because there would have been zero deaths in total — at that point. That is to say, the outcomes of cases (deaths) would not be known.

Even after deaths begin to occur, this lag between the ascertainment of a case or hospitalisation and outcome leads to a consistently biased estimate. Hence, adjusting for such delays using an appropriate delay distribution is essential for accurate estimates of severity.


Barry, Ahmadou, Steve Ahuka-Mundeke, Yahaya Ali Ahmed, Yokouide Allarangar, Julienne Anoko, Brett Nicholas Archer, Aaron Aruna Abedi, et al. 2018. “Outbreak of Ebola virus disease in the Democratic Republic of the Congo, April–May, 2018: an epidemiological study.” The Lancet 392 (10143): 213–21.
Camacho, A., A. J. Kucharski, S. Funk, J. Breman, P. Piot, and W. J. Edmunds. 2014. “Potential for Large Outbreaks of Ebola Virus Disease.” Epidemics 9 (December): 70–78.
Nishiura, Hiroshi, Don Klinkenberg, Mick Roberts, and Johan A. P. Heesterbeek. 2009. “Early Epidemiological Assessment of the Virulence of Emerging Infectious Diseases: A Case Study of an Influenza Pandemic.” PLOS ONE 4 (8): e6852.
Palmer, Joseph, Katharine Sherratt, Richard Martin-Nielsen, Jonnie Bevan, Hamish Gibbs, Cmmid Group, Sebastian Funk, and Sam Abbott. 2021. “Covidregionaldata: Subnational Data for COVID-19 Epidemiology.” Journal of Open Source Software 6 (63): 3290.