Evaluates and depicts results from plink --genome on the LD pruned dataset
(via run_check_relatedness
or externally conducted IBD
estimation). plink --genome calculates identity by state (IBS) for each pair
of individuals based on the average proportion of alleles shared at genotyped
SNPs. The degree of recent shared ancestry, i.e. the identity by descent
(IBD) can be estimated from the genome-wide IBS. The proportion of IBD
between two individuals is returned by --genome as PI_HAT.
evaluate_check_relatedness
finds pairs of samples whose proportion of
IBD is larger than the specified highIBDTh. Subsequently, for pairs of
individual that do not have additional relatives in the dataset, the
individual with the greater genotype missingness rate is selected and
returned as the individual failing the relatedness check. For more complex
family structures, the unrelated individuals per family are selected (e.g. in
a parents-offspring trio, the offspring will be marked as fail, while the
parents will be kept in the analysis).
evaluate_check_relatedness
depicts all pair-wise IBD-estimates as
histograms stratified by value of PI_HAT.
evaluate_check_relatedness( qcdir, name, highIBDTh = 0.1875, imissTh = 0.03, interactive = FALSE, legend_text_size = 5, legend_title_size = 7, axis_text_size = 5, axis_title_size = 7, title_size = 9, verbose = FALSE )
qcdir | [character] path/to/directory/with/QC/results containing name.imiss and name.genome results as returned by plink --missing and plink --genome. |
---|---|
name | [character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam, name.genome and name.imiss. |
highIBDTh | [double] Threshold for acceptable proportion of IBD between pair of individuals. |
imissTh | [double] Threshold for acceptable missing genotype rate in any individual; has to be proportion between (0,1) |
interactive | [logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_IBD() via ggplot2::ggsave(p=p_IBD, other_arguments) or pdf(outfile) print(p_IBD) dev.off(). |
legend_text_size | [integer] Size for legend text. |
legend_title_size | [integer] Size for legend title. |
axis_text_size | [integer] Size for axis text. |
axis_title_size | [integer] Size for axis title. |
title_size | [integer] Size for plot title. |
verbose | [logical] If TRUE, progress info is printed to standard out. |
a named [list] with i) fail_high_IBD containing a [data.frame] of IIDs and FIDs of individuals who fail the IBDTh in columns FID1 and IID1. In addition, the following columns are returned (as originally obtained by plink --genome): FID2 (Family ID for second sample), IID2 (Individual ID for second sample), RT (Relationship type inferred from .fam/.ped file), EZ (IBD sharing expected value, based on just .fam/.ped relationship), Z0 (P(IBD=0)), Z1 (P(IBD=1)), Z2 (P(IBD=2)), PI_HAT (Proportion IBD, i.e. P(IBD=2) + 0.5*P(IBD=1)), PHE (Pairwise phenotypic code (1, 0, -1 = AA, AU, and UU pairs, respectively)), DST (IBS distance, i.e. (IBS2 + 0.5*IBS1) / (IBS0 + IBS1 + IBS2)), PPC (IBS binomial test), RATIO (HETHET : IBS0 SNP ratio (expected value 2)). and ii) failIDs containing a [data.frame] with individual IDs [IID] and family IDs [FID] of individuals failing the highIBDTh; iii) p_IBD, a ggplot2-object 'containing' all pair-wise IBD-estimates as histograms stratified by value of PI_HAT, which can be shown by print(p_IBD and iv) plot_data, a data.frame with the data visualised in p_IBD (iii).
Both run_check_relatedness
and
evaluate_check_relatedness
can simply be invoked by
check_relatedness
.
For details on the output data.frame fail_high_IBD, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#genome.
qcdir <- system.file("extdata", package="plinkQC") name <- 'data' if (FALSE) { relatednessQC <- evaluate_check_relatedness(qcdir=qcdir, name=name, interactive=FALSE) }