# Marker Missingness Module¶

The usage of the standalone module is shown below:

\$ pyGenClean_snp_missingness --help
usage: pyGenClean_snp_missingness [-h] [-v] --bfile FILE [--geno FLOAT]
[--out FILE]

Computes marker missingness using Plink.

optional arguments:
-h, --help     show this help message and exit
-v, --version  show program's version number and exit

Input File:
--bfile FILE   The input file prefix (will find the plink binary files by
appending the prefix to the .bim, .bed and .fam files,
respectively.

Options:
--geno FLOAT   The missingness threshold (remove SNPs with more than x
percent missing genotypes). [Default: 0.020]

Output File:
--out FILE     The prefix of the output files. [default: clean_geno]


## Input Files¶

This module uses PLINK’s binary file format (bed, bim and fam files) for the source data set (the data of interest).

## Procedure¶

Here are the steps performed by the module:

1. Runs Plink to remove markers with a missing rate above a user defined threshold.
2. Finds the markers that were removed (those that have a missing rate above the user defined threshold.

## Output Files¶

The output files of each of the steps described above are as follow (note that the output prefix shown is the one by default [i.e. clean_geno]):

1. One set of Plink output files:
• clean_geno.fam: the dataset with markers having a high missing rate removed (according to a user defined threshold).
2. One custom file:
• clean_geno.removed_snps: a list of markers that have a high missing rate (above a user defined threshold).

## The Algorithm¶

For more information about the actual algorithms and source codes, refer to the following page.