Evaluates and depicts results from plink --check-sex (via run_check_sex or externally conducted sex check). Takes file qcdir/name.sexcheck and returns IIDs for samples whose SNPSEX != PEDSEX (where the SNPSEX is determined by the heterozygosity rate across X-chromosomal variants). Mismatching SNPSEX and PEDSEX IDs can indicate plating errors, sample-mixup or generally samples with poor genotyping. In the latter case, these IDs are likely to fail other QC steps as well. Optionally, an extra data.frame (externalSex) with sample IDs and sex can be provided to double check if external and PEDSEX data (often processed at different centers) match. If a mismatch between PEDSEX and SNPSEX was detected while SNPSEX == Sex, PEDSEX of these individuals can optionally be updated (fixMixup=TRUE). evaluate_check_sex depicts the X-chromosomal heterozygosity (SNPSEX) of the samples split by their (PEDSEX).

evaluate_check_sex(
  qcdir,
  name,
  maleTh = 0.8,
  femaleTh = 0.2,
  externalSex = NULL,
  fixMixup = FALSE,
  indir = qcdir,
  externalFemale = "F",
  externalMale = "M",
  externalSexSex = "Sex",
  externalSexID = "IID",
  verbose = FALSE,
  label_fail = TRUE,
  highlight_samples = NULL,
  highlight_type = c("text", "label", "color", "shape"),
  highlight_text_size = 3,
  highlight_color = "#c51b8a",
  highlight_shape = 17,
  highlight_legend = FALSE,
  legend_text_size = 5,
  legend_title_size = 7,
  axis_text_size = 5,
  axis_title_size = 7,
  title_size = 9,
  path2plink = NULL,
  keep_individuals = NULL,
  remove_individuals = NULL,
  exclude_markers = NULL,
  extract_markers = NULL,
  showPlinkOutput = TRUE,
  interactive = FALSE
)

Arguments

qcdir

[character] /path/to/directory containing name.sexcheck as returned by plink --check-sex.

name

[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam and name.sexcheck.

maleTh

[double] Threshold of X-chromosomal heterozygosity rate for males.

femaleTh

[double] Threshold of X-chromosomal heterozygosity rate for females.

externalSex

[data.frame, optional] with sample IDs [externalSexID] and sex [externalSexSex] to double check if external and PEDSEX data (often processed at different centers) match.

fixMixup

[logical] Should PEDSEX of individuals with mismatch between PEDSEX and Sex, with Sex==SNPSEX automatically corrected: this will directly change the name.bim/.bed/.fam files!

indir

[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files; only required of fixMixup==TRUE. User needs writing permission to indir.

externalFemale

[integer/character] Identifier for 'female' in externalSex.

externalMale

[integer/character] Identifier for 'male' in externalSex.

externalSexSex

[character] Column identifier for column containing sex information in externalSex.

externalSexID

[character] Column identifier for column containing ID information in externalSex.

verbose

[logical] If TRUE, progress info is printed to standard out.

label_fail

[logical] Set TRUE, to add fail IDs as text labels in scatter plot.

highlight_samples

[character vector] Vector of sample IIDs to highlight in the plot (p_sexcheck); all highlight_samples IIDs have to be present in the IIDs of the name.fam file.

highlight_type

[character] Type of sample highlight, labeling by IID ("text"/"label") and/or highlighting data points in different "color" and/or "shape". "text" and "label" use ggrepel for minimal overlap of text labels ("text) or label boxes ("label"). Only one of "text" and "label" can be specified. Text/Label size can be specified with highlight_text_size, highlight color with highlight_color, or highlight shape with highlight_shape.

highlight_text_size

[integer] Text/Label size for samples specified to be highlighted (highlight_samples) by "text" or "label" (highlight_type).

highlight_color

[character] Color for samples specified to be highlighted (highlight_samples) by "color" (highlight_type).

highlight_shape

[integer] Shape for samples specified to be highlighted (highlight_samples) by "shape" (highlight_type). Possible shapes and their encoding can be found at: https://ggplot2.tidyverse.org/articles/ggplot2-specs.html#sec:shape-spec

highlight_legend

[logical] Should a separate legend for the highlighted samples be provided; only relevant for highlight_type == "color" or highlight_type == "shape".

legend_text_size

[integer] Size for legend text.

legend_title_size

[integer] Size for legend title.

axis_text_size

[integer] Size for axis text.

axis_title_size

[integer] Size for axis title.

title_size

[integer] Size for plot title.

path2plink

[character] Absolute path to PLINK executable (https://www.cog-genomics.org/plink/1.9/) i.e. plink should be accessible as path2plink -h. The full name of the executable should be specified: for windows OS, this means path/plink.exe, for unix platforms this is path/plink. If not provided, assumed that PATH set-up works and PLINK will be found by exec('plink').

keep_individuals

[character] Path to file with individuals to be retained in the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples not listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals.

remove_individuals

[character] Path to file with individuals to be removed from the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals.

exclude_markers

[character] Path to file with makers to be removed from the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All listed variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers.

extract_markers

[character] Path to file with makers to be included in the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All unlisted variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers.

showPlinkOutput

[logical] If TRUE, plink log and error messages are printed to standard out.

interactive

[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_sexcheck) via ggplot2::ggsave(p=p_sexcheck, other_arguments) or pdf(outfile) print(p_sexcheck) dev.off().

Value

named list with i) fail_sex: dataframe with FID, IID, PEDSEX, SNPSEX and Sex (if externalSex was provided) of individuals failing sex check; ii) mixup: dataframe with FID, IID, PEDSEX, SNPSEX and Sex (if externalSex was provided) of individuals whose PEDSEX != Sex and Sex == SNPSEX; iii) p_sexcheck, a ggplot2-object 'containing' a scatter plot of the X-chromosomal heterozygosity (SNPSEX) of the individuals split by their (PEDSEX), which can be shown by print(p_sexcheck) and iv) plot_data, a data.frame with the data visualised in p_sexcheck (iii).

Details

Both run_check_sex and evaluate_check_sex can simply be invoked by check_sex.

For details on the output data.frame fail_sex, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#sexcheck.

Examples

qcdir <- system.file("extdata", package="plinkQC") name <- "data" path2plink <- '/path/to/plink' if (FALSE) { fail_sex <- evaluate_check_sex(qcdir=qcdir, name=name, interactive=FALSE, verbose=FALSE, path2plink=path2plink) # highlight samples highlight_samples <- read.table(system.file("extdata", "keep_individuals", package="plinkQC")) fail_sex <- evaluate_check_sex(qcdir=qcdir, name=name, interactive=FALSE, verbose=FALSE, path2plink=path2plink, highlight_samples = highlight_samples[,2], highlight_type = c("label", "color"), highlight_color = "darkgreen") }