pyGenClean.PlinkUtils package¶
For more information about how to use this module, refer to the Plink Utils.
Module contents¶
-
pyGenClean.PlinkUtils.
createRowFromPlinkSpacedOutput
(line)[source]¶ Remove leading spaces and change spaces to tabs.
Param: line: a line from a Plink’s report file. Type: line: str Returns: an array containing each field from the input line. Plink’s output files are usually created so that they are human readable. Hence, instead of separating fields using tabulation, it uses a certain amount of spaces to create columns. Using the
re
module, the fields are split.>>> line = " CHR SNP BP A1 A2" >>> createRowFromPlinkSpacedOutput(line) ['CHR', 'SNP', 'BP', 'A1', 'A2']
-
pyGenClean.PlinkUtils.
get_plink_version
()[source]¶ Gets the Plink version from the binary.
Returns: the version of the Plink software Return type: str This function uses
subprocess.Popen
to gather the version of the Plink binary. Since executing the software to gather the version creates an output file, it is deleted.Warning
This function only works as long as the version is returned as
| PLINK! | NNN |
(where,NNN
is the version), since we use regular expresion to extract the version number from the standard output of the software.
Submodules¶
pyGenClean.PlinkUtils.compare_bim module¶
-
exception
pyGenClean.PlinkUtils.compare_bim.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.PlinkUtils.compare_bim.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.PlinkUtils.compare_bim.
compareSNPs
(before, after, outFileName)[source]¶ Compares two set of SNPs.
Parameters: Finds the difference between two sets of markers, and write them in the
outFileName
file.Note
A
ProgramError
is raised if:- There are more markers in the
after
set than in thebefore
set. - Some markers that are in the
after
set are not in thebefore
set.
- There are more markers in the
-
pyGenClean.PlinkUtils.compare_bim.
main
()[source]¶ The main function of the module.
The purpose of this module is to find markers that were removed by Plink. When Plinks exclude some markers from binary files, there are no easy way to find the list of removed markers, except by comparing the two BIM files (before and after modification).
Here are the steps of this module:
- Reads the BIM file before the modification (
readBIM()
). - Reads the BIM file after the modification (
readBIM()
). - Compares the list of markers before and after modification, and write
the removed markers into a file (
compareSNPs()
).
Note
This module only finds marker that were removed (since adding markers to a BIM file usually includes a companion file to tell Plink which marker to add.
- Reads the BIM file before the modification (
-
pyGenClean.PlinkUtils.compare_bim.
parseArgs
()[source]¶ Parses the command line options and arguments.
Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --before
string The name of the BIM file before modification. --after
string The name of the BIM file after modification. --out
string The prefix of the output files Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.PlinkUtils.compare_bim.
readBIM
(fileName)[source]¶ Reads a BIM file.
Parameters: fileName (str) – the name of the BIM file to read. Returns: the set of markers in the BIM file. Reads a Plink BIM file and extract the name of the markers. There is one marker per line, and the name of the marker is in the second column. There is no header in the BIM file.
pyGenClean.PlinkUtils.plot_MDS module¶
-
exception
pyGenClean.PlinkUtils.plot_MDS.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.PlinkUtils.plot_MDS.
addCustomOptions
(parser)[source]¶ Adds custom options to a parser.
Parameters: parser (argparse.parser) – the parser.
-
pyGenClean.PlinkUtils.plot_MDS.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.PlinkUtils.plot_MDS.
extractData
(fileName, populations)[source]¶ Extract the C1 and C2 columns for plotting.
Parameters: - fileName (dict) – the name of the MDS file.
- populations – the population of each sample in the MDS file.
Returns: the MDS data with information about the population of each sample. The first element of the returned tuple is a tuple. The last element of the returned tuple is the list of the populations (the order is the same as in the first element). The first element of the first tuple is the C1 data, and the last element is the C2 data.
Note
If a sample in the MDS file is not in the population file, it is skip.
-
pyGenClean.PlinkUtils.plot_MDS.
main
()[source]¶ The main function of the module.
These are the steps:
- Reads the population file (
readPopulations()
). - Extract the MDS data (
extractData()
). - Plots the MDS data (
plotMDS()
).
- Reads the population file (
-
pyGenClean.PlinkUtils.plot_MDS.
parseArgs
()[source]¶ Parses the command line options and arguments.
Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --file
string The MBS file. --population-file
string A file containing population information. --format
string The output file format. --title
string The title of the MDS plot. --xlabel
string The label of the X axis. --ylabel
string The label of the Y axis. --out
string The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.PlinkUtils.plot_MDS.
plotMDS
(data, theOrders, theLabels, theColors, theSizes, theMarkers, options)[source]¶ Plot the MDS data.
Parameters: - data (list of numpy.array) – the data to plot (MDS values).
- theOrders (list) – the order of the populations to plot.
- theLabels (list) – the names of populations to plot.
- theColors (list) – the colors of the populations to plot.
- theSizes (list) – the sizes of the markers for each population to plot.
- theMarkers (list) – the type of markers for each population to plot.
- options (argparse.Namespace) – the options.
pyGenClean.PlinkUtils.plot_MDS_standalone module¶
-
exception
pyGenClean.PlinkUtils.plot_MDS_standalone.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.PlinkUtils.plot_MDS_standalone.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.PlinkUtils.plot_MDS_standalone.
extractData
(fileName, populations, population_order, xaxis, yaxis)[source]¶ Extract the C1 and C2 columns for plotting.
Parameters: Returns: the MDS data with information about the population of each sample. The first element of the returned tuple is a tuple. The last element of the returned tuple is the list of the populations (the order is the same as in the first element). The first element of the first tuple is the C1 data, and the last element is the C2 data.
Note
If a sample in the MDS file is not in the population file, it is skip.
-
pyGenClean.PlinkUtils.plot_MDS_standalone.
main
()[source]¶ The main function of the module.
These are the steps:
- Reads the population file (
readPopulations()
). - Extracts the MDS values (
extractData()
). - Plots the MDS values (
plotMDS()
).
- Reads the population file (
-
pyGenClean.PlinkUtils.plot_MDS_standalone.
parseArgs
()[source]¶ Parses the command line options and arguments.
Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --file
string The MBS file. --population-file
string A file containing population information. --population-order
string The order to print the different populations. --population-colors
string The population point color in the plot. --population-sizes
string The population point size in the plot. --population-markers
string The population point marker in the plot. --population-alpha
string The population alpha value in the plot. --format
string The output file format. --title
string The title of the MDS plot. --xaxis
string The component to print on the X axis. --xlabel
string The label of the X axis. --yaxis
string The component to print on the Y axis. --ylabel
string The label of the Y axis. --legend-position
string The position of the legend. --legend-size
int The size of the legend text. --legend-ncol
int The number of columns for the legend. --legend-alpha
float The alpha value of the legend. --title-fontsize
int The font size of the title. --label-fontsize
int The font size of the X and Y labels. --axis-fontsize
int The font size of the X and Y axis. --adjust-left
float Adjust the left margin. --adjust-right
float Adjust the right margin. --adjust-top
float Adjust the top margin. --adjust-bottom
float Adjust the bottom margin. --out
string The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.PlinkUtils.plot_MDS_standalone.
plotMDS
(data, theOrders, theLabels, theColors, theAlphas, theSizes, theMarkers, options)[source]¶ Plot the MDS data.
Parameters: - data (list of numpy.array) – the data to plot (MDS values).
- theOrders (list) – the order of the populations to plot.
- theLabels (list) – the names of the populations to plot.
- theColors (list) – the colors of the populations to plot.
- theAlphas (list) – the alpha value for the populations to plot.
- theSizes (list) – the sizes of the markers for each population to plot.
- theMarkers (list) – the type of marker for each population to plot.
- options (argparse.Namespace) – the options.
pyGenClean.PlinkUtils.subset_data module¶
-
exception
pyGenClean.PlinkUtils.subset_data.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.PlinkUtils.subset_data.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.Note
Only one operation for markers and one operation for samples can be done at a time. Hence, one of
--exclude
or--extract
can be done for markers, and one of--remove
or--keep
can be done for samples.
-
pyGenClean.PlinkUtils.subset_data.
main
(argString=None)[source]¶ The main function of the modile.
Parameters: argString (list) – the options. Here are the steps:
- Prints the options.
- Subset the data (
subset_data()
).
Note
The type of the output files are determined by the type of the input files (e.g. if the input files are binary files, so will be the output ones).
-
pyGenClean.PlinkUtils.subset_data.
parseArgs
(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the parameters. Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --ifile
string The input file prefix. --is-bfile
bool The input file is a bfile --is-tfile
bool The input file is a tfile --is-file
bool The input file is a file --exclude
string A file containing SNPs to exclude from the data set. --extract
string A file containing SNPs to extract from the data set. --remove
string A file containing samples (FID and IID) to remove from the data set. --keep
string A file containing samples (FID and IID) to keep from the data set. --out
string The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.PlinkUtils.subset_data.
runCommand
(command)[source]¶ Runs a command.
Parameters: command (list) – the command to run. If there is a problem, a
ProgramError
is raised.
-
pyGenClean.PlinkUtils.subset_data.
safe_main
()[source]¶ A safe version of the main function (that catches ProgramError).
-
pyGenClean.PlinkUtils.subset_data.
subset_data
(options)[source]¶ Subset the data.
Parameters: options (argparse.Namespace) – the options. Subset the data using either
--exclude
or--extract``for markers or ``--remove
orkeep
for samples.