pyGenClean.Contamination package¶
For more information about how to use this module, refer to the Contamination Module.
Module contents¶
Submodules¶
pyGenClean.Contamination.contamination module¶
-
exception
pyGenClean.Contamination.contamination.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.Contamination.contamination.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.Contamination.contamination.
check_sample_files
(fam_filename, raw_dirname)[source]¶ Checks the raw sample files.
Parameters: Returns: the set of all the sample files that are compatible with the FAM file.
Return type:
-
pyGenClean.Contamination.contamination.
create_extraction_file
(bim_filename, out_prefix)[source]¶ Creates an extraction file (keeping only markers on autosomes).
Parameters:
-
pyGenClean.Contamination.contamination.
main
(argString=None)[source]¶ The main function of the module.
Parameters: argString (list) – the options. These are the steps:
- Prints the options.
- Compute frequency using Plink.
- Runs bafRegress.
-
pyGenClean.Contamination.contamination.
parseArgs
(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --bfile
string The input file prefix (will find the plink binary files by appending the prefix to the .bim, .bed and .fam files, respectively). --raw-dir
string Directory containing the raw data (one file per sample, where the name of the file (minus the extension) is the sample identification number. --colsample
string The sample column. --colmarker
string The marker column. --colbaf
string The B allele frequency column. --colab1
string The AB Allele 1 column. --colab2
string The AB Allele 2 column. --out
string The prefix of the output files. --sge
bool Use SGE for parallelization. --sge-walltime
string The walltime for the job to run on the cluster. Do not use if you are not required to specify a walltime for your jobs on your cluster (e.g. ‘qsub -lwalltime=1:0:0’ on the cluster). --sge-nodes
int The number of nodes and the number of processor per nodes to use (e.g. ‘qsub -lnodes=X:ppn=Y’ on the cluster, where X is the number of nodes and Y is the number of processor to use. Do not use if you are not required to specify the number of nodes for your jobs on the cluster. --sample-per-run-for-sge
int The number of sample to run for a single SGE job. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.Contamination.contamination.
run_bafRegress
(filenames, out_prefix, extract_filename, freq_filename, options)[source]¶ Runs the bafRegress function.
Parameters: - filenames (set) – the set of all sample files.
- out_prefix (str) – the output prefix.
- extract_filename (str) – the name of the markers to extract.
- freq_filename (str) – the name of the file containing the frequency.
- options (argparse.Namespace) – the other options.
-
pyGenClean.Contamination.contamination.
run_bafRegress_sge
(filenames, out_prefix, extract_filename, freq_filename, options)[source]¶ Runs the bafRegress function using SGE.
Parameters: - filenames (set) – the set of all sample files.
- out_prefix (str) – the output prefix.
- extract_filename (str) – the name of the markers to extract.
- freq_filename (str) – the name of the file containing the frequency.
- options (argparse.Namespace) – the other options.