.. _sample_missingness_label:

Sample Missingness Module
=========================

The usage of the standalone module is shown below:

.. code-block:: console

    $ pyGenClean_sample_missingness --help
    usage: pyGenClean_sample_missingness [-h] [-v] --ifile FILE [--is-bfile]
                                         [--mind FLOAT] [--out FILE]

    Computes sample missingness using Plink.

    optional arguments:
      -h, --help     show this help message and exit
      -v, --version  show program's version number and exit

    Input File:
      --ifile FILE   The input file prefix (by default, this input file must be a
                     tfile. If options --is-bfile is used, the input file must be
                     a bfile).

    Options:
      --is-bfile     The input file (--ifile) is a bfile instead of a tfile.
      --mind FLOAT   The missingness threshold (remove samples with more than x
                     percent missing genotypes). [Default: 0.100]

    Output File:
      --out FILE     The prefix of the output files (wich will be a Plink binary
                     file). [default: clean_mind]


Input Files
-----------

This module uses either PLINK's binary file format (``bed``, ``bim`` and ``fam``
files) or the transposed pedfile format separated by tabulations (``tped`` and
``tfam``) for the source data set (the data of interest).

Procedure
---------

Here are the steps performed by the module:

1.  Uses Plink to remove samples with a high missing rate (above a user defined
    threshold).

Output Files
------------

The output files of each of the steps described above are as follow (note that
the output prefix shown is the one by default [*i.e.* ``clean_geno``]):

1.  One set of PLINK's output and result files:

    *   ``clean_mind``: the new dataset with samples having a high missing rate
        removed (above a user defined threshold). The file ``clean_mind.irem``
        contains a list of samples that were removed.

The Algorithm
-------------

For more information about the actual algorithms and source codes, refer to the
following page.

* :py:mod:`pyGenClean.SampleMissingness.sample_missingness`