Pruning of SNPs in Linkage Disequilibrium

Runs plink –indep-pairwise to remove SNPs in linkage disequilibrium. It excludes variants that found in a high linkage disequilbirum loci.

Usage

pruning_ld(
  indir,
  name,
  qcdir = indir,
  path2plink = NULL,
  filter_high_ldregion = TRUE,
  high_ldregion_file = NULL,
  genomebuild = "hg38",
  window_size = 50,
  step_size = 5,
  r_2 = 0.2,
  showPlinkOutput = TRUE,
  keep_individuals = NULL,
  remove_individuals = NULL,
  exclude_markers = NULL,
  extract_markers = NULL,
  verbose = FALSE
)

Arguments

indir: [character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files.
name: [character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam, name.genome and name.imiss.
qcdir: [character] /path/to/directory to where name.genome as returned by plink –genome will be saved. Per default qcdir=indir. If run.check_relatedness is FALSE, it is assumed that plink –missing and plink –genome have been run and qcdir/name.imiss and qcdir/name.genome exist. User needs writing permission to qcdir.
path2plink: [character] Absolute path to PLINK executable (https://www.cog-genomics.org/plink/1.9/) i.e. plink should be accessible as path2plink -h. The full name of the executable should be specified: for windows OS, this means path/plink.exe, for unix platforms this is path/plink. If not provided, assumed that PATH set-up works and PLINK will be found by exec('plink').
filter_high_ldregion: [logical] Should high LD regions be filtered before IBD estimation; carried out per default with high LD regions for hg19 provided as default via genomebuild. For alternative genome builds not provided or non-human data, high LD regions files can be provided via high_ldregion_file.
high_ldregion_file: [character] Path to file with high LD regions used for filtering before IBD estimation if filter_high_ldregion == TRUE, otherwise ignored; for human genome data, high LD region files are provided and can simply be chosen via genomebuild. Files have to be space-delimited, no column names with the following columns: chromosome, region-start, region-end, region number. Chromosomes are specified without 'chr' prefix. For instance: 1 48000000 52000000 1 2 86000000 100500000 2
genomebuild: [character] Name of the genome build of the PLINK file annotations, ie mappings in the name.bim file. Will be used to remove high-LD regions based on the coordinates of the respective build. Options are hg18, hg19 and hg38. See @details.
window_size: [integer] The size of the window (in variant count) in which variants in the window are pruned
step_size: [integer] The variant count to shift the window
r_2: [float] The threshold in which variant pairs with a squared correlation above the threshold are removed
showPlinkOutput: [logical] If TRUE, plink log and error messages are printed to standard out.
keep_individuals: [character] Path to file with individuals to be retained in the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples not listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals.
remove_individuals: [character] Path to file with individuals to be removed from the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals.
exclude_markers: [character] Path to file with makers to be removed from the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All listed variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers.
extract_markers: [character] Path to file with makers to be included in the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All unlisted variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers.
verbose: [logical] If TRUE, progress info is printed to standard out.

Value

Files with a .pruned with the pruned SNPS

Examples

if (FALSE) { # \dontrun{
indir <- system.file("extdata", package="plinkQC")
name <- 'data'
path2plink <- "path/to/plink"

# whole dataset
relatednessQC <- check_relatedness(indir=indir, name=name, interactive=FALSE,
run.check_relatedness=FALSE, path2plink=path2plink)

# subset of dataset
remove_individuals_file <- system.file("extdata", "remove_individuals",
package="plinkQC")
fail_relatedness <- check_relatedness(indir=qcdir, name=name,
remove_individuals=remove_individuals_file, path2plink=path2plink)
} # }