pyGenClean.Contamination package¶
For more information about how to use this module, refer to the Contamination Module.
Module contents¶
Submodules¶
pyGenClean.Contamination.contamination module¶
-
exception
pyGenClean.Contamination.contamination.ProgramError(msg)[source]¶ Bases:
exceptions.ExceptionAn
Exceptionraised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.Contamination.contamination.checkArgs(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: Trueif everything was OK.If there is a problem with an option, an exception is raised using the
ProgramErrorclass, a message is printed to thesys.stderrand the program exists with code 1.
-
pyGenClean.Contamination.contamination.check_sample_files(fam_filename, raw_dirname)[source]¶ Checks the raw sample files.
Parameters: Returns: the set of all the sample files that are compatible with the FAM file.
Return type:
-
pyGenClean.Contamination.contamination.create_extraction_file(bim_filename, out_prefix)[source]¶ Creates an extraction file (keeping only markers on autosomes).
Parameters:
-
pyGenClean.Contamination.contamination.main(argString=None)[source]¶ The main function of the module.
Parameters: argString (list) – the options. These are the steps:
- Prints the options.
- Compute frequency using Plink.
- Runs bafRegress.
-
pyGenClean.Contamination.contamination.parseArgs(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespaceobject created by theargparsemodule. It contains the values of the different options.Options Type Description --bfilestring The input file prefix (will find the plink binary files by appending the prefix to the .bim, .bed and .fam files, respectively). --raw-dirstring Directory containing the raw data (one file per sample, where the name of the file (minus the extension) is the sample identification number. --colsamplestring The sample column. --colmarkerstring The marker column. --colbafstring The B allele frequency column. --colab1string The AB Allele 1 column. --colab2string The AB Allele 2 column. --outstring The prefix of the output files. --sgebool Use SGE for parallelization. --sge-walltimestring The walltime for the job to run on the cluster. Do not use if you are not required to specify a walltime for your jobs on your cluster (e.g. ‘qsub -lwalltime=1:0:0’ on the cluster). --sge-nodesint The number of nodes and the number of processor per nodes to use (e.g. ‘qsub -lnodes=X:ppn=Y’ on the cluster, where X is the number of nodes and Y is the number of processor to use. Do not use if you are not required to specify the number of nodes for your jobs on the cluster. --sample-per-run-for-sgeint The number of sample to run for a single SGE job. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()).
-
pyGenClean.Contamination.contamination.run_bafRegress(filenames, out_prefix, extract_filename, freq_filename, options)[source]¶ Runs the bafRegress function.
Parameters: - filenames (set) – the set of all sample files.
- out_prefix (str) – the output prefix.
- extract_filename (str) – the name of the markers to extract.
- freq_filename (str) – the name of the file containing the frequency.
- options (argparse.Namespace) – the other options.
-
pyGenClean.Contamination.contamination.run_bafRegress_sge(filenames, out_prefix, extract_filename, freq_filename, options)[source]¶ Runs the bafRegress function using SGE.
Parameters: - filenames (set) – the set of all sample files.
- out_prefix (str) – the output prefix.
- extract_filename (str) – the name of the markers to extract.
- freq_filename (str) – the name of the file containing the frequency.
- options (argparse.Namespace) – the other options.
