pyGenClean.Contamination package¶

For more information about how to use this module, refer to the Contamination Module.

Module contents¶

Submodules¶

pyGenClean.Contamination.contamination module¶

exception pyGenClean.Contamination.contamination.ProgramError(msg)[source]¶

Bases: exceptions.Exception

An Exception raised in case of a problem.

Parameters:	msg (str) – the message to print to the user before exiting.

pyGenClean.Contamination.contamination.checkArgs(args)[source]¶

Checks the arguments and options.

Parameters:	args (argparse.Namespace) – an object containing the options of the program.
Returns:	`True` if everything was OK.

If there is a problem with an option, an exception is raised using the ProgramError class, a message is printed to the sys.stderr and the program exists with code 1.

pyGenClean.Contamination.contamination.check_sample_files(fam_filename, raw_dirname)[source]¶

Checks the raw sample files.

Parameters:	fam_filename (str) – the name of the FAM file. raw_dirname (str) – the name of the directory containing the raw file.
Returns:	the set of all the sample files that are compatible with the FAM file.
Return type:	set

pyGenClean.Contamination.contamination.create_extraction_file(bim_filename, out_prefix)[source]¶

Creates an extraction file (keeping only markers on autosomes).

Parameters:	bim_filename (str) – the name of the BIM file. out_prefix (str) – the prefix for the output file.

pyGenClean.Contamination.contamination.main(argString=None)[source]¶

The main function of the module.

Parameters:	argString (list) – the options.

These are the steps:

Prints the options.
Compute frequency using Plink.
Runs bafRegress.

pyGenClean.Contamination.contamination.parseArgs(argString=None)[source]¶

Parses the command line options and arguments.

Parameters:	argString (list) – the options.
Returns:	A `argparse.Namespace` object created by the `argparse` module. It contains the values of the different options.

Options	Type	Description
`--bfile`	string	The input file prefix (will find the plink binary files by appending the prefix to the .bim, .bed and .fam files, respectively).
`--raw-dir`	string	Directory containing the raw data (one file per sample, where the name of the file (minus the extension) is the sample identification number.
`--colsample`	string	The sample column.
`--colmarker`	string	The marker column.
`--colbaf`	string	The B allele frequency column.
`--colab1`	string	The AB Allele 1 column.
`--colab2`	string	The AB Allele 2 column.
`--out`	string	The prefix of the output files.
`--sge`	bool	Use SGE for parallelization.
`--sge-walltime`	string	The walltime for the job to run on the cluster. Do not use if you are not required to specify a walltime for your jobs on your cluster (e.g. ‘qsub -lwalltime=1:0:0’ on the cluster).
`--sge-nodes`	int	The number of nodes and the number of processor per nodes to use (e.g. ‘qsub -lnodes=X:ppn=Y’ on the cluster, where X is the number of nodes and Y is the number of processor to use. Do not use if you are not required to specify the number of nodes for your jobs on the cluster.
`--sample-per-run-for-sge`	int	The number of sample to run for a single SGE job.

Note

No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see checkArgs()).

pyGenClean.Contamination.contamination.run_bafRegress(filenames, out_prefix, extract_filename, freq_filename, options)[source]¶

Runs the bafRegress function.

Parameters:	filenames (set) – the set of all sample files. out_prefix (str) – the output prefix. extract_filename (str) – the name of the markers to extract. freq_filename (str) – the name of the file containing the frequency. options (argparse.Namespace) – the other options.

pyGenClean.Contamination.contamination.run_bafRegress_sge(filenames, out_prefix, extract_filename, freq_filename, options)[source]¶

Runs the bafRegress function using SGE.

Parameters:	filenames (set) – the set of all sample files. out_prefix (str) – the output prefix. extract_filename (str) – the name of the markers to extract. freq_filename (str) – the name of the file containing the frequency. options (argparse.Namespace) – the other options.

pyGenClean.Contamination.contamination.run_plink(in_prefix, out_prefix, extract_filename)[source]¶

Runs Plink with the geno option.

Parameters:	in_prefix (str) – the input prefix. out_prefix (str) – the output prefix. extract_filename – the name of the file containing markers to extract. extract_filename – str

pyGenClean.Contamination.contamination.safe_main()[source]¶: A safe version of the main function (that catches ProgramError).

Table Of Contents

Previous topic

Next topic

This Page

pyGenClean.Contamination package¶

Module contents¶

Submodules¶

pyGenClean.Contamination.contamination module¶