pyGenClean.Misc package

Submodules

pyGenClean.Misc.compare_gold_standard module

exception pyGenClean.Misc.compare_gold_standard.ProgramError(msg)[source]

Bases: exceptions.Exception

An Exception raised in case of a problem.

Parameters:msg (str) – the message to print to the user before exiting.
pyGenClean.Misc.compare_gold_standard.checkArgs(args)[source]

Checks the arguments and options.

Parameters:args (argparse.Namespace) – a Namespace object containing the options of the program.
Returns:True if everything was OK.

If there is a problem with an option, an exception is raised using the ProgramError class, a message is printed to the sys.stderr and the program exists with code 1.

pyGenClean.Misc.compare_gold_standard.check_fam_for_samples(required_samples, source, gold)[source]

Check fam files for required_samples.

pyGenClean.Misc.compare_gold_standard.computeFrequency(prefix, outPrefix)[source]

Compute the frequency using Plink.

pyGenClean.Misc.compare_gold_standard.compute_statistics(out_dir, gold_prefix, source_prefix, same_samples, use_sge, final_out_prefix)[source]

Compute the statistics.

pyGenClean.Misc.compare_gold_standard.exclude_SNPs_samples(inPrefix, outPrefix, exclusionSNP=None, keepSample=None, transpose=False)[source]

Exclude some SNPs and keep some samples using Plink.

pyGenClean.Misc.compare_gold_standard.extractSNPs(prefixes, snpToExtractFileNames, outPrefixes, runSGE)[source]

Extract a list of SNPs using Plink.

pyGenClean.Misc.compare_gold_standard.findFlippedSNPs(goldFrqFile1, sourceAlleles, outPrefix)[source]

Find flipped SNPs and flip them in the data1.

pyGenClean.Misc.compare_gold_standard.findOverlappingSNPsWithGoldStandard(prefix, gold_prefixe, out_prefix, use_marker_names=False)[source]

Find the overlapping SNPs in 4 different data sets.

pyGenClean.Misc.compare_gold_standard.flipSNPs(inPrefix, outPrefix, flipFileName)[source]

Flip SNPs using Plink.

pyGenClean.Misc.compare_gold_standard.illumina_to_snp(strand, snp)[source]

Return the TOP strand of the marker.

Function that takes a strand (TOP or BOT) and a SNP (e.g. : [A/C]) and returns a space separated AlleleA[space]AlleleB string.

Parameters:
  • strand (str) – Either “TOP” or “BOT”
  • snp (str) – [A/C], [A/T], [G/C], [T/C], [A/G], [C/G], [T/A] or [T/G].
Returns:

The nucleotide for allele A and the nucleotide for allele B (space separated)

Return type:

str

pyGenClean.Misc.compare_gold_standard.keepSamples(prefixes, samplesToExtractFileNames, outPrefixes, runSGE, transpose=False)[source]

Extract a list of SNPs using Plink.

pyGenClean.Misc.compare_gold_standard.main(argString=None)[source]
pyGenClean.Misc.compare_gold_standard.parseArgs(argString=None)[source]

Parses the command line options and arguments.

Returns:A numpy.Namespace object created by the argparse module. It contains the values of the different options.
Options Type Description
--bfile string The input file prefix (will find the plink binary files by appending the prefix to the .bim, .bed and .fam files, respectively).
--gold-bfile string The input file prefix (will find the plink binary files by appending the prefix to the .bim, .bed and .fam files, respectively) for the Gold Standard.
--same-samples string A file containing samples which are present in both the gold standard and the source panel. One line by identity and tab separated. For each row, first sample is Gold Standard, second is source panel.
--source-manifest string The illumina marker manifest. This file should have tabs as field separator. There should be no lines before the main header line. There should be no lines after the last data line.
--source-alleles string A file containing the source alleles (TOP). Two columns (separated by tabulation, one with the marker name, the other with the alleles (separated by space). No header.
--sge boolean Use SGE for parallelization.
--do-not-flip boolean Do not flip SNPs. WARNING: only use this option only if the Gold Standard was generated using the same chip (hence, flipping is unnecessary).
--use-marker-names boolean Use marker names instead of (chr, position). WARNING: only use this options only if the Gold Standard was generated using the same chip (hence, they have the same marker names).
--out string The prefix of the output files.

Note

No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see checkArgs()).

pyGenClean.Misc.compare_gold_standard.read_same_samples_file(filename, out_prefix)[source]

Reads a file containing same samples.

pyGenClean.Misc.compare_gold_standard.read_source_alleles(file_name)[source]

Reads an allele file.

pyGenClean.Misc.compare_gold_standard.read_source_manifest(file_name)[source]

Reads Illumina manifest.

pyGenClean.Misc.compare_gold_standard.renameSNPs(inPrefix, updateFileName, outPrefix)[source]

Updates the name of the SNPs using Plink.

pyGenClean.Misc.compare_gold_standard.runCommand(command)[source]

Run a command.

pyGenClean.Misc.compare_gold_standard.safe_main()[source]

A safe version of the main function (that catches ProgramError).

Module contents