pyGenClean.NoCallHetero package¶
For more information about how to use this module, refer to the Clean No Call and Only Heterozygous Markers Module.
Module contents¶
Submodules¶
pyGenClean.NoCallHetero.clean_noCall_hetero_snps module¶
-
exception
pyGenClean.NoCallHetero.clean_noCall_hetero_snps.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.NoCallHetero.clean_noCall_hetero_snps.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.NoCallHetero.clean_noCall_hetero_snps.
main
(argString=None)[source]¶ The main function of the module.
Parameters: argString (list) – the options. These are the steps:
- Prints the options.
- Reads the
tfam
andtped
files and find all heterozygous and all failed markers (processTPEDandTFAM()
).
-
pyGenClean.NoCallHetero.clean_noCall_hetero_snps.
parseArgs
(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --tfile
string The input file prefix (Plink tfile). --out
string The prefix of the output files Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.NoCallHetero.clean_noCall_hetero_snps.
processTPEDandTFAM
(tped, tfam, prefix)[source]¶ Process the TPED and TFAM files.
Parameters: Copies the original
tfam
file intoprefix.tfam
. Then, it reads thetped
file and keeps in memory two sets containing the markers which are all failed or which contains only heterozygous genotypes.It creates two output files,
prefix.allFailed
andprefix.allHetero
, containing the markers that are all failed and are all heterozygous, respectively.Note
All heterozygous markers located on the mitochondrial chromosome are not remove.
pyGenClean.NoCallHetero.heterozygosity_plot module¶
-
exception
pyGenClean.NoCallHetero.heterozygosity_plot.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.NoCallHetero.heterozygosity_plot.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.NoCallHetero.heterozygosity_plot.
compute_heterozygosity
(in_prefix, nb_samples)[source]¶ Computes the heterozygosity ratio of samples (from tped).
-
pyGenClean.NoCallHetero.heterozygosity_plot.
compute_nb_samples
(in_prefix)[source]¶ Check the number of samples.
Parameters: in_prefix (str) – the prefix of the input file. Returns: the number of sample in prefix.fam
.
-
pyGenClean.NoCallHetero.heterozygosity_plot.
is_heterozygous
(genotype)[source]¶ Tells if a genotype “A B” is heterozygous.
Parameters: genotype (str) – the genotype to test for heterozygosity. Returns: True
if the genotype is heterozygous,False
otherwise.The genotype must contain two alleles, separated by a space. It then compares the first allele (
genotype[0]
) with the last one (genotype[-1]
).>>> is_heterozygous("A A") False >>> is_heterozygous("G C") True >>> is_heterozygous("0 0") # No call is not heterozygous. False
-
pyGenClean.NoCallHetero.heterozygosity_plot.
main
(argString=None)[source]¶ The main function of the module.
Parameters: argString (list) – the options. These are the steps:
- Prints the options.
- Checks the number of samples in the
tfam
file (compute_nb_samples()
). - Computes the heterozygosity rate (
compute_heterozygosity()
). - Saves the heterozygosity data (in
out.het
). - Plots the heterozygosity rate (
plot_heterozygosity()
).
-
pyGenClean.NoCallHetero.heterozygosity_plot.
parseArgs
(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --tfile
string The prefix of the transposed file. --boxplot
bool Draw a boxplot instead of a histogram. --format
string The output file format. --bins
int The number of bins for the histogram. --xlim
float The limit of the x axis. --ymax
float “The maximal Y value. --out
string The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.NoCallHetero.heterozygosity_plot.
plot_heterozygosity
(heterozygosity, options)[source]¶ Plots the heterozygosity rate distribution.
Parameters: - heterozygosity (numpy.array) – the heterozygosity data.
- options (argparse.Namespace) – the options.
Plots a histogram or a boxplot of the heterozygosity distribution.
-
pyGenClean.NoCallHetero.heterozygosity_plot.
safe_main
()[source]¶ A safe version of the main function (that catches ProgramError).
-
pyGenClean.NoCallHetero.heterozygosity_plot.
save_heterozygosity
(heterozygosity, samples, out_prefix)[source]¶ Saves the heterozygosity data.
Parameters: - heterozygosity (numpy.array) – the heterozygosity data.
- samples (list of tuples of str) – the list of samples.
- out_prefix (str) – the prefix of the output files.