This is an initial attempt to enable easy calculation/visualization of study designs from R/gap which benchmarked relevant publications and eventually the app can produce more generic results.

One can run the app with R/gap installation as follows,


Alternatively, one can run the app from source using gap/inst/shinygap. In fact, these are conveniently wrapped up as runshinygap() function.

To set the default parameters, some compromises need to be made, e.g., Kp=[1e-5, 0.4], MAF=[1e-3, 0.8], alpha=[1e-8, 0.05], beta=[0.01, 0.4]. The slider inputs provide upper bounds of parameters.

1 Family-based study

This is a call to fbsize().

2 Population-based study

This is a call to pbsize().

3 Case-cohort study

This is a call to ccsize() whose power argument indcates power (TRUE) or sample size (FALSE) calculation.

4 Two-stage case-control design

We implement it in function whose format is

tscc(model, GRR, p1, n1, n2, M, alpha.genome, pi.samples, pi.markers, K)

which requires specification of disease model (multiplicative, additive, dominant, recessive), genotypic relative risk (GRR), the estimated risk allele frequency in cases (\(p_1\)), total number of cases (\(n_1\)) total number of controls (\(n_2\)), total number of markers (\(M\)), the false positive rate at genome level (\(\alpha_\mathit{genome}\)), the proportion of markers to be selected (\(\pi_\mathit{markers}\), also used as the false positive rate at stage 1) and the population prevalence (\(K\)).

Appendix: Theory

A. Family-based and population-based designs

This is detailed in the package vignettes gap,, or jss1.

B. Case-cohort design

Our implemention is with respect to two aspects2.

B.1 Power

\[\Phi\left(Z_\alpha+\tilde{n}^\frac{1}{2}\theta\sqrt{\frac{p_1p_2p_D}{q+(1-q)p_D}}\right)\] where \(\alpha\) is the significance level, \(\theta\) is the log-hazard ratio for two groups, \(p_j, j = 1, 2\), are the proportion of the two groups in the population (\(p_1 + p_2 = 1\)), \(\tilde{n}\) is the total number of subjects in the subcohort, \(p_D\) is the proportion of the failures in the full cohort, and \(q\) is the sampling fraction of the subcohort.

B.2 Sample size

\[\tilde{n}=\frac{nBp_D}{n-B(1-p_D)}\] where \(B=\frac{Z_{1-\alpha}+Z_\beta}{\theta^2p_1p_2p_D}\) and \(n\) is the whole cohort size.

C. Two-stage case-control design

Tests of allele frequency differences between cases and controls in a two-stage design are described here3. The usual test of proportions can be written as \[z(p_1,p_2,n_1,n_2,\pi_{samples})=\frac{p_1-p_2}{\sqrt{\frac{p_1(1-p_1)}{2n_1\pi_{sample}}+\frac{p_2(1-p_2)}{2n_2\pi_{sample}}}}\] where \(p_1\) and \(p_2\) are the allele frequencies, \(n_1\) and \(n_2\) are the sample sizes, \(\pi_{samples}\) is the proportion of samples to be genotyped at stage 1. The test statistics for stage 1, for stage 2 as replication and for stages 1 and 2 in a joint analysis are then \(z_1 = z(\hat p_1,\hat p_2,n_1,n_2,\pi_{samples})\), \(z_2 = z(\hat p_1,\hat p_2,n_1,n_2,1-\pi_{samples})\), \(z_j = \sqrt{\pi_{samples}}z_1+\sqrt{1-\pi_{samples}}z_2\), respectively. Let \(C_1\), \(C_2\), and \(C_j\) be the thresholds for these statistics, the false positive rates can be obtained according to \(P(|z_1|>C_1)P(|z_2|>C_2,sign(z_1)=sign(z_2))\) and \(P(|z_1|>C_1)P(|z_j|>C_j||z_1|>C_1)\) for replication-based and joint analyses, respectively.


Zhao, J. H. gap: Genetic analysis package. Journal of Statistical Software 23, 1–18 (2007).
Cai, J. & Zeng, D. Sample size/power calculation for case-cohort studies. Biometrics 60, 1015–24 (2004).
Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38, 209–13 (2006).