Genome-wide gene-environment interaction (GxE) studies

SPAGxECCT package gives generic GxE analytical frameworks to analyze a wide variaty of phenotypes.

Software dependencies and operating systems

SPAGxECCT has been thoroughly examined and validated on both Linux and Windows operating systems.

  • Currently, R package SPAGxECCT supports three formats for genotype input: the R matrix (Rdata) format, the PLINK format, and the BGEN format.

  • In the near future, R package SPAGxECCT is planned to be rewritten using Rcpp code to improve its performance and efficiency.

How to install and load R package SPAGxECCT

library(devtools)  # author version: 2.4.5
install_github("YuzhuoMa97/SPAGxECCT")
library(SPAGxECCT)
?SPAGxECCT  # manual of SPAGxECCT package
  • Current version is 1.1.0. For older version and version update information, plesase refer to OldVersions.

Summary and comparison of main features for efficient G×E analysis methods

Method Trait Prospective/Retrospective Account for population admixture Account for local ancestry Account for family relatedness Account for unbalanced phenotypic distribution
SPAGE Binary Prospective       YES
SPAGxE Quantitative/Binary/Survival/Ordinal/Others Retrospective       YES
SPAGxEWald Quantitative/Binary/Survival/Ordinal/Others Retrospective       YES
SPAGxECCT Quantitative/Binary/Survival/Ordinal/Others Retrospective       YES
SPAGxE+ Quantitative/Binary/Survival/Ordinal/Others Retrospective     YES YES
SPAGxEmixCCT Quantitative/Binary/Survival/Ordinal/Others Retrospective YES     YES
SPAGxEmixCCT-local Quantitative/Binary/Survival/Ordinal/Others Retrospective YES YES   YES
SPAGxEmixCCT-local-global Quantitative/Binary/Survival/Ordinal/Others Retrospective YES YES   YES
SPAGxEmix+ Quantitative/Binary/Survival/Ordinal/Others Retrospective YES YES YES YES

Quick start-up examples (genotype input using R matrix format)

The following example illustrates how to use SPAGxECCT to analyze a binary trait, with genotype data input provided in the R matrix format.

Step 1. Read in data and fit a genotype-independent model

library(SPAGxECCT)
# Simulate phenotype and genotype
N = 10000                                        # sample size
nSNP = 100                                       # number of SNPs
MAF = 0.1                                        # minor allele frequency
Geno.mtx = matrix(rbinom(N*nSNP,2,MAF),N,nSNP)   # genotype matrix
# NOTE: The row and column names of genotype matrix are required.
rownames(Geno.mtx) = paste0("IID-",1:N)
colnames(Geno.mtx) = paste0("SNP-",1:nSNP)

# phenotype data
Phen.mtx = data.frame(ID = paste0("IID-",1:N),
                      Y=rbinom(N,1,0.5),
                      Cov1=rnorm(N),
                      Cov2=rbinom(N,1,0.5),
                      E = rnorm(N))

Cova.mtx = Phen.mtx[,c("Cov1","Cov2")]    # covariates dataframe excluding environmental factor E
E = Phen.mtx$E                            # environmental factor E

# fit a genotype-independent model
R = SPA_G_Get_Resid("binary",
                    glm(formula = Y ~ Cov1+Cov2+E, data = Phen.mtx, family = "binomial"),
                    data=Phen.mtx,
                    pIDs=Phen.mtx$ID,
                    gIDs=paste0("IID-",1:N))

Step 2. Conduct a marker-level association study

# calculate p values
binary.res = SPAGxE_CCT(traits = "binary",                       # trait type
                        Geno.mtx = Geno.mtx,                     # genotype matrix
                        R = R,                                   # residuals from genotype-independent model (null model in which marginal genetic effect and GxE effect are 0)
                        E = E,                                   # environmental factor
                        Phen.mtx = Phen.mtx,                     # phenotype dataframe
                        Cova.mtx = Cova.mtx)                     # covariates dataframe excluding environmental factor E

# we recommand using column of 'p.value.spaGxE.CCT.Wald' to associate genotype with binary phenotypes
head(binary.res)

The following example illustrates how to use SPAGxECCT to analyze a binary trait, with genotype data input provided in PLINK file format.

Step 1. Read in data and fit a genotype-independent model

library(SPAGxECCT)
# Simulate phenotype and genotype
N = 10000 # sample size

# PLINK format for genotype data
GenoFile = system.file("", "GenoMat_SPAGxE.bed", package = "SPAGxECCT")

# phenotype data
Phen.mtx = data.frame(ID = paste0("IID-",1:N),
                      Y=rbinom(N,1,0.5),
                      Cov1=rnorm(N),
                      Cov2=rbinom(N,1,0.5),
                      E = rnorm(N))

Cova.mtx = Phen.mtx[,c("Cov1","Cov2")]    # covariates dataframe excluding environmental factor E  
E = Phen.mtx$E                            # environmental factor E

# fit a genotype-independent model
R = SPA_G_Get_Resid("binary",
                    glm(formula = Y ~ Cov1+Cov2+E, data = Phen.mtx, family = "binomial"),
                    data=Phen.mtx,
                    pIDs=Phen.mtx$ID,
                    gIDs=paste0("IID-",1:N))

Step 2. Conduct a marker-level association study

# calculate p values
binary.res = SPAGxE_CCT(traits = "binary",                       # trait type
                        GenoFile = GenoFile,                     # a character of genotype file
                        R = R,                                   # residuals from genotype-independent model (null model in which marginal genetic effect and GxE effect are 0)
                        E = E,                                   # environmental factor
                        Phen.mtx = Phen.mtx,                     # phenotype dataframe
                        Cova.mtx = Cova.mtx)                     # a covariate matrix excluding the environmental factor E


# we recommand using column of 'p.value.spaGxE.CCT.Wald' to associate genotype with binary phenotypes
head(binary.res)

Quick start-up examples (genotype input using BGEN file format)

The following example illustrates how to use SPAGxECCT to analyze a binary trait, with genotype data input provided in BGEN file format.

Step 1. Read in data and fit a genotype-independent model

library(SPAGxECCT)
# Simulate phenotype and genotype
N = 10000

# BGEN format for genotype data
GenoFile = system.file("", "GenoMat_SPAGxE.bgen", package = "SPAGxECCT")
GenoFileIndex = c(system.file("", "GenoMat_SPAGxE.bgen.bgi", package = "SPAGxECCT"),
                  system.file("", "GenoMat_SPAGxE.sample", package = "SPAGxECCT"))

# phenotype data
Phen.mtx = data.frame(ID = paste0("IID-",1:N),
                      Y=rbinom(N,1,0.5),
                      Cov1=rnorm(N),
                      Cov2=rbinom(N,1,0.5),
                      E = rnorm(N))

Cova.mtx = Phen.mtx[,c("Cov1","Cov2")]    # covariates dataframe excluding environmental factor E  
E = Phen.mtx$E                            # environmental factor E

# fit a genotype-independent model
R = SPA_G_Get_Resid("binary",
                    glm(formula = Y ~ Cov1+Cov2+E, data = Phen.mtx, family = "binomial"),
                    data=Phen.mtx,
                    pIDs=Phen.mtx$ID,
                    gIDs=paste0("IID-",1:N))

Step 2. Conduct a marker-level association study

# calculate p values
binary.res = SPAGxE_CCT(traits = "binary",                       # trait type
                        GenoFile = GenoFile,                     # a character of genotype file
                        GenoFileIndex = GenoFileIndex,           # additional index file(s) corresponding to GenoFile
                        R = R,                                   # residuals from genotype-independent model (null model in which marginal genetic effect and GxE effect are 0)
                        E = E,                                   # environmental factor
                        Phen.mtx = Phen.mtx,                     # phenotype dataframe
                        Cova.mtx = Cova.mtx)                     # a covariate matrix excluding the environmental factor E


# we recommand using column of 'p.value.spaGxE.CCT.Wald' to associate genotype with binary phenotypes
head(binary.res)

Note: choose traits

Argument traits is to specify the type of phenotype data. Currently, SPAGxECCT package supports the below combinations

phenotype traits
binary trait “binary”
quantitative trait “quantitative”
ordinal categorical trait “categorical”
time-to-event trait “time-to-event”
Other trait (e.g. longitudinal trait) “Resid”

Table of contents