pyGenClean.PlinkUtils package¶
For more information about how to use this module, refer to the Plink Utils.
Module contents¶
-
pyGenClean.PlinkUtils.createRowFromPlinkSpacedOutput(line)[source]¶ Remove leading spaces and change spaces to tabs.
Param: line: a line from a Plink’s report file. Type: line: str Returns: an array containing each field from the input line. Plink’s output files are usually created so that they are human readable. Hence, instead of separating fields using tabulation, it uses a certain amount of spaces to create columns. Using the
remodule, the fields are split.>>> line = " CHR SNP BP A1 A2" >>> createRowFromPlinkSpacedOutput(line) ['CHR', 'SNP', 'BP', 'A1', 'A2']
-
pyGenClean.PlinkUtils.get_plink_version()[source]¶ Gets the Plink version from the binary.
Returns: the version of the Plink software Return type: str This function uses
subprocess.Popento gather the version of the Plink binary. Since executing the software to gather the version creates an output file, it is deleted.Warning
This function only works as long as the version is returned as
| PLINK! | NNN |(where,NNNis the version), since we use regular expresion to extract the version number from the standard output of the software.
Submodules¶
pyGenClean.PlinkUtils.compare_bim module¶
-
exception
pyGenClean.PlinkUtils.compare_bim.ProgramError(msg)[source]¶ Bases:
exceptions.ExceptionAn
Exceptionraised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.PlinkUtils.compare_bim.checkArgs(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: Trueif everything was OK.If there is a problem with an option, an exception is raised using the
ProgramErrorclass, a message is printed to thesys.stderrand the program exists with code 1.
-
pyGenClean.PlinkUtils.compare_bim.compareSNPs(before, after, outFileName)[source]¶ Compares two set of SNPs.
Parameters: Finds the difference between two sets of markers, and write them in the
outFileNamefile.Note
A
ProgramErroris raised if:- There are more markers in the
afterset than in thebeforeset. - Some markers that are in the
afterset are not in thebeforeset.
- There are more markers in the
-
pyGenClean.PlinkUtils.compare_bim.main()[source]¶ The main function of the module.
The purpose of this module is to find markers that were removed by Plink. When Plinks exclude some markers from binary files, there are no easy way to find the list of removed markers, except by comparing the two BIM files (before and after modification).
Here are the steps of this module:
- Reads the BIM file before the modification (
readBIM()). - Reads the BIM file after the modification (
readBIM()). - Compares the list of markers before and after modification, and write
the removed markers into a file (
compareSNPs()).
Note
This module only finds marker that were removed (since adding markers to a BIM file usually includes a companion file to tell Plink which marker to add.
- Reads the BIM file before the modification (
-
pyGenClean.PlinkUtils.compare_bim.parseArgs()[source]¶ Parses the command line options and arguments.
Returns: A argparse.Namespaceobject created by theargparsemodule. It contains the values of the different options.Options Type Description --beforestring The name of the BIM file before modification. --afterstring The name of the BIM file after modification. --outstring The prefix of the output files Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()).
-
pyGenClean.PlinkUtils.compare_bim.readBIM(fileName)[source]¶ Reads a BIM file.
Parameters: fileName (str) – the name of the BIM file to read. Returns: the set of markers in the BIM file. Reads a Plink BIM file and extract the name of the markers. There is one marker per line, and the name of the marker is in the second column. There is no header in the BIM file.
pyGenClean.PlinkUtils.plot_MDS module¶
-
exception
pyGenClean.PlinkUtils.plot_MDS.ProgramError(msg)[source]¶ Bases:
exceptions.ExceptionAn
Exceptionraised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.PlinkUtils.plot_MDS.addCustomOptions(parser)[source]¶ Adds custom options to a parser.
Parameters: parser (argparse.parser) – the parser.
-
pyGenClean.PlinkUtils.plot_MDS.checkArgs(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: Trueif everything was OK.If there is a problem with an option, an exception is raised using the
ProgramErrorclass, a message is printed to thesys.stderrand the program exists with code 1.
-
pyGenClean.PlinkUtils.plot_MDS.extractData(fileName, populations)[source]¶ Extract the C1 and C2 columns for plotting.
Parameters: - fileName (dict) – the name of the MDS file.
- populations – the population of each sample in the MDS file.
Returns: the MDS data with information about the population of each sample. The first element of the returned tuple is a tuple. The last element of the returned tuple is the list of the populations (the order is the same as in the first element). The first element of the first tuple is the C1 data, and the last element is the C2 data.
Note
If a sample in the MDS file is not in the population file, it is skip.
-
pyGenClean.PlinkUtils.plot_MDS.main()[source]¶ The main function of the module.
These are the steps:
- Reads the population file (
readPopulations()). - Extract the MDS data (
extractData()). - Plots the MDS data (
plotMDS()).
- Reads the population file (
-
pyGenClean.PlinkUtils.plot_MDS.parseArgs()[source]¶ Parses the command line options and arguments.
Returns: A argparse.Namespaceobject created by theargparsemodule. It contains the values of the different options.Options Type Description --filestring The MBS file. --population-filestring A file containing population information. --formatstring The output file format. --titlestring The title of the MDS plot. --xlabelstring The label of the X axis. --ylabelstring The label of the Y axis. --outstring The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()).
-
pyGenClean.PlinkUtils.plot_MDS.plotMDS(data, theOrders, theLabels, theColors, theSizes, theMarkers, options)[source]¶ Plot the MDS data.
Parameters: - data (list of numpy.array) – the data to plot (MDS values).
- theOrders (list) – the order of the populations to plot.
- theLabels (list) – the names of populations to plot.
- theColors (list) – the colors of the populations to plot.
- theSizes (list) – the sizes of the markers for each population to plot.
- theMarkers (list) – the type of markers for each population to plot.
- options (argparse.Namespace) – the options.
pyGenClean.PlinkUtils.plot_MDS_standalone module¶
-
exception
pyGenClean.PlinkUtils.plot_MDS_standalone.ProgramError(msg)[source]¶ Bases:
exceptions.ExceptionAn
Exceptionraised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.PlinkUtils.plot_MDS_standalone.checkArgs(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: Trueif everything was OK.If there is a problem with an option, an exception is raised using the
ProgramErrorclass, a message is printed to thesys.stderrand the program exists with code 1.
-
pyGenClean.PlinkUtils.plot_MDS_standalone.extractData(fileName, populations, population_order, xaxis, yaxis)[source]¶ Extract the C1 and C2 columns for plotting.
Parameters: Returns: the MDS data with information about the population of each sample. The first element of the returned tuple is a tuple. The last element of the returned tuple is the list of the populations (the order is the same as in the first element). The first element of the first tuple is the C1 data, and the last element is the C2 data.
Note
If a sample in the MDS file is not in the population file, it is skip.
-
pyGenClean.PlinkUtils.plot_MDS_standalone.main()[source]¶ The main function of the module.
These are the steps:
- Reads the population file (
readPopulations()). - Extracts the MDS values (
extractData()). - Plots the MDS values (
plotMDS()).
- Reads the population file (
-
pyGenClean.PlinkUtils.plot_MDS_standalone.parseArgs()[source]¶ Parses the command line options and arguments.
Returns: A argparse.Namespaceobject created by theargparsemodule. It contains the values of the different options.Options Type Description --filestring The MBS file. --population-filestring A file containing population information. --population-orderstring The order to print the different populations. --population-colorsstring The population point color in the plot. --population-sizesstring The population point size in the plot. --population-markersstring The population point marker in the plot. --population-alphastring The population alpha value in the plot. --formatstring The output file format. --titlestring The title of the MDS plot. --xaxisstring The component to print on the X axis. --xlabelstring The label of the X axis. --yaxisstring The component to print on the Y axis. --ylabelstring The label of the Y axis. --legend-positionstring The position of the legend. --legend-sizeint The size of the legend text. --legend-ncolint The number of columns for the legend. --legend-alphafloat The alpha value of the legend. --title-fontsizeint The font size of the title. --label-fontsizeint The font size of the X and Y labels. --axis-fontsizeint The font size of the X and Y axis. --adjust-leftfloat Adjust the left margin. --adjust-rightfloat Adjust the right margin. --adjust-topfloat Adjust the top margin. --adjust-bottomfloat Adjust the bottom margin. --outstring The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()).
-
pyGenClean.PlinkUtils.plot_MDS_standalone.plotMDS(data, theOrders, theLabels, theColors, theAlphas, theSizes, theMarkers, options)[source]¶ Plot the MDS data.
Parameters: - data (list of numpy.array) – the data to plot (MDS values).
- theOrders (list) – the order of the populations to plot.
- theLabels (list) – the names of the populations to plot.
- theColors (list) – the colors of the populations to plot.
- theAlphas (list) – the alpha value for the populations to plot.
- theSizes (list) – the sizes of the markers for each population to plot.
- theMarkers (list) – the type of marker for each population to plot.
- options (argparse.Namespace) – the options.
pyGenClean.PlinkUtils.subset_data module¶
-
exception
pyGenClean.PlinkUtils.subset_data.ProgramError(msg)[source]¶ Bases:
exceptions.ExceptionAn
Exceptionraised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.PlinkUtils.subset_data.checkArgs(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: Trueif everything was OK.If there is a problem with an option, an exception is raised using the
ProgramErrorclass, a message is printed to thesys.stderrand the program exists with code 1.Note
Only one operation for markers and one operation for samples can be done at a time. Hence, one of
--excludeor--extractcan be done for markers, and one of--removeor--keepcan be done for samples.
-
pyGenClean.PlinkUtils.subset_data.main(argString=None)[source]¶ The main function of the modile.
Parameters: argString (list) – the options. Here are the steps:
- Prints the options.
- Subset the data (
subset_data()).
Note
The type of the output files are determined by the type of the input files (e.g. if the input files are binary files, so will be the output ones).
-
pyGenClean.PlinkUtils.subset_data.parseArgs(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the parameters. Returns: A argparse.Namespaceobject created by theargparsemodule. It contains the values of the different options.Options Type Description --ifilestring The input file prefix. --is-bfilebool The input file is a bfile --is-tfilebool The input file is a tfile --is-filebool The input file is a file --excludestring A file containing SNPs to exclude from the data set. --extractstring A file containing SNPs to extract from the data set. --removestring A file containing samples (FID and IID) to remove from the data set. --keepstring A file containing samples (FID and IID) to keep from the data set. --outstring The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()).
-
pyGenClean.PlinkUtils.subset_data.runCommand(command)[source]¶ Runs a command.
Parameters: command (list) – the command to run. If there is a problem, a
ProgramErroris raised.
-
pyGenClean.PlinkUtils.subset_data.safe_main()[source]¶ A safe version of the main function (that catches ProgramError).
-
pyGenClean.PlinkUtils.subset_data.subset_data(options)[source]¶ Subset the data.
Parameters: options (argparse.Namespace) – the options. Subset the data using either
--excludeor--extract``for markers or ``--removeorkeepfor samples.
