pyGenClean.Contamination package

For more information about how to use this module, refer to the Contamination Module.

Module contents

Submodules

pyGenClean.Contamination.contamination module

exception pyGenClean.Contamination.contamination.ProgramError(msg)[source]

Bases: exceptions.Exception

An Exception raised in case of a problem.

Parameters:msg (str) – the message to print to the user before exiting.
pyGenClean.Contamination.contamination.checkArgs(args)[source]

Checks the arguments and options.

Parameters:args (argparse.Namespace) – an object containing the options of the program.
Returns:True if everything was OK.

If there is a problem with an option, an exception is raised using the ProgramError class, a message is printed to the sys.stderr and the program exists with code 1.

pyGenClean.Contamination.contamination.check_sample_files(fam_filename, raw_dirname)[source]

Checks the raw sample files.

Parameters:
  • fam_filename (str) – the name of the FAM file.
  • raw_dirname (str) – the name of the directory containing the raw file.
Returns:

the set of all the sample files that are compatible with the FAM file.

Return type:

set

pyGenClean.Contamination.contamination.create_extraction_file(bim_filename, out_prefix)[source]

Creates an extraction file (keeping only markers on autosomes).

Parameters:
  • bim_filename (str) – the name of the BIM file.
  • out_prefix (str) – the prefix for the output file.
pyGenClean.Contamination.contamination.main(argString=None)[source]

The main function of the module.

Parameters:argString (list) – the options.

These are the steps:

  1. Prints the options.
  2. Compute frequency using Plink.
  3. Runs bafRegress.
pyGenClean.Contamination.contamination.parseArgs(argString=None)[source]

Parses the command line options and arguments.

Parameters:argString (list) – the options.
Returns:A argparse.Namespace object created by the argparse module. It contains the values of the different options.
Options Type Description
--bfile string The input file prefix (will find the plink binary files by appending the prefix to the .bim, .bed and .fam files, respectively).
--raw-dir string Directory containing the raw data (one file per sample, where the name of the file (minus the extension) is the sample identification number.
--colsample string The sample column.
--colmarker string The marker column.
--colbaf string The B allele frequency column.
--colab1 string The AB Allele 1 column.
--colab2 string The AB Allele 2 column.
--out string The prefix of the output files.
--sge bool Use SGE for parallelization.
--sge-walltime string The walltime for the job to run on the cluster. Do not use if you are not required to specify a walltime for your jobs on your cluster (e.g. ‘qsub -lwalltime=1:0:0’ on the cluster).
--sge-nodes int The number of nodes and the number of processor per nodes to use (e.g. ‘qsub -lnodes=X:ppn=Y’ on the cluster, where X is the number of nodes and Y is the number of processor to use. Do not use if you are not required to specify the number of nodes for your jobs on the cluster.
--sample-per-run-for-sge int The number of sample to run for a single SGE job.

Note

No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see checkArgs()).

pyGenClean.Contamination.contamination.run_bafRegress(filenames, out_prefix, extract_filename, freq_filename, options)[source]

Runs the bafRegress function.

Parameters:
  • filenames (set) – the set of all sample files.
  • out_prefix (str) – the output prefix.
  • extract_filename (str) – the name of the markers to extract.
  • freq_filename (str) – the name of the file containing the frequency.
  • options (argparse.Namespace) – the other options.
pyGenClean.Contamination.contamination.run_bafRegress_sge(filenames, out_prefix, extract_filename, freq_filename, options)[source]

Runs the bafRegress function using SGE.

Parameters:
  • filenames (set) – the set of all sample files.
  • out_prefix (str) – the output prefix.
  • extract_filename (str) – the name of the markers to extract.
  • freq_filename (str) – the name of the file containing the frequency.
  • options (argparse.Namespace) – the other options.

Runs Plink with the geno option.

Parameters:
  • in_prefix (str) – the input prefix.
  • out_prefix (str) – the output prefix.
  • extract_filename – the name of the file containing markers to extract.
  • extract_filename – str
pyGenClean.Contamination.contamination.safe_main()[source]

A safe version of the main function (that catches ProgramError).