pyGenClean.Misc package¶
Submodules¶
pyGenClean.Misc.compare_gold_standard module¶
-
exception
pyGenClean.Misc.compare_gold_standard.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.Misc.compare_gold_standard.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args ( argparse.Namespace
) – aNamespace
object containing the options of the program.Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.Misc.compare_gold_standard.
check_fam_for_samples
(required_samples, source, gold)[source]¶ Check fam files for required_samples.
-
pyGenClean.Misc.compare_gold_standard.
computeFrequency
(prefix, outPrefix)[source]¶ Compute the frequency using Plink.
-
pyGenClean.Misc.compare_gold_standard.
compute_statistics
(out_dir, gold_prefix, source_prefix, same_samples, use_sge, final_out_prefix)[source]¶ Compute the statistics.
-
pyGenClean.Misc.compare_gold_standard.
exclude_SNPs_samples
(inPrefix, outPrefix, exclusionSNP=None, keepSample=None, transpose=False)[source]¶ Exclude some SNPs and keep some samples using Plink.
-
pyGenClean.Misc.compare_gold_standard.
extractSNPs
(prefixes, snpToExtractFileNames, outPrefixes, runSGE)[source]¶ Extract a list of SNPs using Plink.
-
pyGenClean.Misc.compare_gold_standard.
findFlippedSNPs
(goldFrqFile1, sourceAlleles, outPrefix)[source]¶ Find flipped SNPs and flip them in the data1.
-
pyGenClean.Misc.compare_gold_standard.
findOverlappingSNPsWithGoldStandard
(prefix, gold_prefixe, out_prefix, use_marker_names=False)[source]¶ Find the overlapping SNPs in 4 different data sets.
-
pyGenClean.Misc.compare_gold_standard.
flipSNPs
(inPrefix, outPrefix, flipFileName)[source]¶ Flip SNPs using Plink.
-
pyGenClean.Misc.compare_gold_standard.
illumina_to_snp
(strand, snp)[source]¶ Return the TOP strand of the marker.
Function that takes a strand (TOP or BOT) and a SNP (e.g. : [A/C]) and returns a space separated AlleleA[space]AlleleB string.
Parameters: Returns: The nucleotide for allele A and the nucleotide for allele B (space separated)
Return type:
-
pyGenClean.Misc.compare_gold_standard.
keepSamples
(prefixes, samplesToExtractFileNames, outPrefixes, runSGE, transpose=False)[source]¶ Extract a list of SNPs using Plink.
-
pyGenClean.Misc.compare_gold_standard.
parseArgs
(argString=None)[source]¶ Parses the command line options and arguments.
Returns: A numpy.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --bfile
string The input file prefix (will find the plink binary files by appending the prefix to the .bim
,.bed
and.fam
files, respectively).--gold-bfile
string The input file prefix (will find the plink binary files by appending the prefix to the .bim
,.bed
and.fam
files, respectively) for the Gold Standard.--same-samples
string A file containing samples which are present in both the gold standard and the source panel. One line by identity and tab separated. For each row, first sample is Gold Standard, second is source panel. --source-manifest
string The illumina marker manifest. This file should have tabs as field separator. There should be no lines before the main header line. There should be no lines after the last data line. --source-alleles
string A file containing the source alleles (TOP). Two columns (separated by tabulation, one with the marker name, the other with the alleles (separated by space). No header. --sge
boolean Use SGE for parallelization. --do-not-flip
boolean Do not flip SNPs. WARNING: only use this option only if the Gold Standard was generated using the same chip (hence, flipping is unnecessary). --use-marker-names
boolean Use marker names instead of (chr, position). WARNING: only use this options only if the Gold Standard was generated using the same chip (hence, they have the same marker names). --out
string The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.Misc.compare_gold_standard.
read_same_samples_file
(filename, out_prefix)[source]¶ Reads a file containing same samples.
-
pyGenClean.Misc.compare_gold_standard.
read_source_manifest
(file_name)[source]¶ Reads Illumina manifest.