Run plink --pca to calculate the principal components on merged genotypes of the study and reference dataset.
run_check_ancestry( indir, prefixMergedDataset, qcdir = indir, verbose = FALSE, path2plink = NULL, keep_individuals = NULL, remove_individuals = NULL, exclude_markers = NULL, extract_markers = NULL, showPlinkOutput = TRUE )
indir | [character] /path/to/directory containing the basic PLINK data files prefixMergedDataset.bim,prefixMergedDataset.fam and prefixMergedDataset.bed. |
---|---|
prefixMergedDataset | [character] Prefix of merged study and reference data files, i.e. prefixMergedDataset.bed, prefixMergedDataset.bim, prefixMergedDataset.fam. |
qcdir | [character] /path/to/directory to save prefixMergedDataset.eigenvec as returned by plink --pca. User needs writing permission to qcdir. Per default qcdir=indir. |
verbose | [logical] If TRUE, progress info is printed to standard out. |
path2plink | [character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accessible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by |
keep_individuals | [character] Path to file with individuals to be retained in the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples not listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals. |
remove_individuals | [character] Path to file with individuals to be removed from the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals. |
exclude_markers | [character] Path to file with makers to be removed from the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All listed variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers. |
extract_markers | [character] Path to file with makers to be included in the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All unlisted variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers. |
showPlinkOutput | [logical] If TRUE, plink log and error messages are printed to standard out. |
Both, run_check_ancestry
and its evaluation by
evaluate_check_ancestry
can simply be invoked by
check_ancestry
.
indir <- system.file("extdata", package="plinkQC") qcdir <- tempdir() prefixMergedDataset <- 'data.HapMapIII' path2plink <- 'path/to/plink' # the following code is not run on package build, as the path2plink on the # user system is not known. if (FALSE) { # ancestry check on all individuals in dataset run <- run_check_ancestry(indir=indir, qcdir=qcdir, prefixMergedDataset, path2plink=path2plink) # ancestry check on subset of dataset remove_individuals_file <- system.file("extdata", "remove_individuals", package="plinkQC") run <- run_check_ancestry(indir=indir, qcdir=qcdir, name=name, remove_individuals=remove_individuals_file, path2plink=path2plink) }