GIGWA Example

Khaled Al-Shamaa



This R package assists breeders in linking data systems with their analytic pipelines, a crucial step in digitizing breeding processes. It supports querying and retrieving phenotypic and genotypic data from systems like EBS, BMS, BreedBase, and GIGWA (using BrAPI calls). Extra helper functions support environmental data sources, including TerraClimate and FAO HWSDv2 soil database.


GIGWA is a web-based tool which provides an easy and intuitive way to explore large amounts of genotyping data by filtering the latter based not only on variant features, including functional annotations, but also on genotype patterns. The data storage relies on MongoDB, which offers good scalability perspectives. GIGWA can handle multiple databases and may be deployed in either single or multi-user mode. Finally, it provides a wide range of popular export formats.


The Breeding API (BrAPI) project is an effort to enable interoperability among plant breeding databases. BrAPI is a standardized RESTful web service API specification for communicating plant breeding data. This community driven standard is free to be used by anyone interested in plant breeding data management.


if (!require("remotes")) install.packages("remotes")


# load the QBMS library

# The public GIGWA testing server required no authentication. If your GIGWA server 
# requires authentication, then make sure that no_auth parameter value is FALSE
# IMPORTENT NOTE: QBMS required GIGWA version 2.4.1 or higher
                time_out = 300, engine = "gigwa", no_auth = TRUE)

# If login is required, then you can use your GIGWA account (interactive mode)
# or pass your GIGWA username and password as parameters (batch mode)
# login_gigwa()
# login_gigwa("gigwadmin", "nimda")

# list existing databases in the current GIGWA server

# select a database by name

# list all projects in the selected database

# select a project by name

# list all runs in the selected project

# select a specific run by name

# get a list of all samples in the selected run
samples <- gigwa_get_samples()

# show the first 6 individuals on the list of samples

# query the variants (e.g., SNPs markers) in the selected run 
# that match the given criteria:
# - max_missing: maximum missing ratio (by sample) [0-1] (default is 1 for 100%) 
# - min_maf: minimum Minor Allele Frequency (MAF) [0-1] (default is 0 for 0%) 
# - start: start position of region (zero-based, inclusive) (e.g., 19750802)
# - end: end position of region (zero-based, exclusive) (e.g., 19850125)
# - referenceName: reference sequence name  (e.g., '6H' in the Barley LI-AM)
# - samples: a list of a samples subset (default is NULL will retrieve for all samples) 
marker_matrix <- gigwa_get_variants(max_missing = 0.2, 
                                    min_maf = 0.05, 
                                    start = 100000,
                                    end = 500000,
                                    samples = c("ind1", "ind3", "ind7"))

# Data returns in data.frame format. The first 4 columns describe attributes of the SNP 
# - rs#: variant name
# - alleles: reference allele / alternative allele
# - chrom: chromosome name
# - pos: position in bp
# while the following columns describe the SNP value for a single sample line using 
# numerical coding 0, 1, and 2 for reference, heterozygous, alternative/minor alleles.

# get the metadata associated with the samples in the current active run

# get a list of all samples in the selected run
metadata <- gigwa_get_metadata()