pyGenClean.NoCallHetero package

For more information about how to use this module, refer to the Clean No Call and Only Heterozygous Markers Module.

Module contents

Submodules

pyGenClean.NoCallHetero.clean_noCall_hetero_snps module

exception pyGenClean.NoCallHetero.clean_noCall_hetero_snps.ProgramError(msg)[source]

Bases: exceptions.Exception

An Exception raised in case of a problem.

Parameters:msg (str) – the message to print to the user before exiting.
pyGenClean.NoCallHetero.clean_noCall_hetero_snps.checkArgs(args)[source]

Checks the arguments and options.

Parameters:args (argparse.Namespace) – an object containing the options of the program.
Returns:True if everything was OK.

If there is a problem with an option, an exception is raised using the ProgramError class, a message is printed to the sys.stderr and the program exists with code 1.

pyGenClean.NoCallHetero.clean_noCall_hetero_snps.main(argString=None)[source]

The main function of the module.

Parameters:argString (list) – the options.

These are the steps:

  1. Prints the options.
  2. Reads the tfam and tped files and find all heterozygous and all failed markers (processTPEDandTFAM()).
pyGenClean.NoCallHetero.clean_noCall_hetero_snps.parseArgs(argString=None)[source]

Parses the command line options and arguments.

Parameters:argString (list) – the options.
Returns:A argparse.Namespace object created by the argparse module. It contains the values of the different options.
Options Type Description
--tfile string The input file prefix (Plink tfile).
--out string The prefix of the output files

Note

No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see checkArgs()).

pyGenClean.NoCallHetero.clean_noCall_hetero_snps.processTPEDandTFAM(tped, tfam, prefix)[source]

Process the TPED and TFAM files.

Parameters:
  • tped (str) – the name of the tped file.
  • tfam (str) – the name of the tfam file.
  • prefix (str) – the prefix of the output files.

Copies the original tfam file into prefix.tfam. Then, it reads the tped file and keeps in memory two sets containing the markers which are all failed or which contains only heterozygous genotypes.

It creates two output files, prefix.allFailed and prefix.allHetero, containing the markers that are all failed and are all heterozygous, respectively.

Note

All heterozygous markers located on the mitochondrial chromosome are not remove.

pyGenClean.NoCallHetero.clean_noCall_hetero_snps.safe_main()[source]

A safe version of the main function (that catches ProgramError).

pyGenClean.NoCallHetero.heterozygosity_plot module

exception pyGenClean.NoCallHetero.heterozygosity_plot.ProgramError(msg)[source]

Bases: exceptions.Exception

An Exception raised in case of a problem.

Parameters:msg (str) – the message to print to the user before exiting.
pyGenClean.NoCallHetero.heterozygosity_plot.checkArgs(args)[source]

Checks the arguments and options.

Parameters:args (argparse.Namespace) – an object containing the options of the program.
Returns:True if everything was OK.

If there is a problem with an option, an exception is raised using the ProgramError class, a message is printed to the sys.stderr and the program exists with code 1.

pyGenClean.NoCallHetero.heterozygosity_plot.compute_heterozygosity(in_prefix, nb_samples)[source]

Computes the heterozygosity ratio of samples (from tped).

pyGenClean.NoCallHetero.heterozygosity_plot.compute_nb_samples(in_prefix)[source]

Check the number of samples.

Parameters:in_prefix (str) – the prefix of the input file.
Returns:the number of sample in prefix.fam.
pyGenClean.NoCallHetero.heterozygosity_plot.is_heterozygous(genotype)[source]

Tells if a genotype “A B” is heterozygous.

Parameters:genotype (str) – the genotype to test for heterozygosity.
Returns:True if the genotype is heterozygous, False otherwise.

The genotype must contain two alleles, separated by a space. It then compares the first allele (genotype[0]) with the last one (genotype[-1]).

>>> is_heterozygous("A A")
False
>>> is_heterozygous("G C")
True
>>> is_heterozygous("0 0") # No call is not heterozygous.
False
pyGenClean.NoCallHetero.heterozygosity_plot.main(argString=None)[source]

The main function of the module.

Parameters:argString (list) – the options.

These are the steps:

  1. Prints the options.
  2. Checks the number of samples in the tfam file (compute_nb_samples()).
  3. Computes the heterozygosity rate (compute_heterozygosity()).
  4. Saves the heterozygosity data (in out.het).
  5. Plots the heterozygosity rate (plot_heterozygosity()).
pyGenClean.NoCallHetero.heterozygosity_plot.parseArgs(argString=None)[source]

Parses the command line options and arguments.

Parameters:argString (list) – the options.
Returns:A argparse.Namespace object created by the argparse module. It contains the values of the different options.
Options Type Description
--tfile string The prefix of the transposed file.
--boxplot bool Draw a boxplot instead of a histogram.
--format string The output file format.
--bins int The number of bins for the histogram.
--xlim float The limit of the x axis.
--ymax float “The maximal Y value.
--out string The prefix of the output files.

Note

No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see checkArgs()).

pyGenClean.NoCallHetero.heterozygosity_plot.plot_heterozygosity(heterozygosity, options)[source]

Plots the heterozygosity rate distribution.

Parameters:

Plots a histogram or a boxplot of the heterozygosity distribution.

pyGenClean.NoCallHetero.heterozygosity_plot.safe_main()[source]

A safe version of the main function (that catches ProgramError).

pyGenClean.NoCallHetero.heterozygosity_plot.save_heterozygosity(heterozygosity, samples, out_prefix)[source]

Saves the heterozygosity data.

Parameters:
  • heterozygosity (numpy.array) – the heterozygosity data.
  • samples (list of tuples of str) – the list of samples.
  • out_prefix (str) – the prefix of the output files.