pyGenClean.SexCheck package¶
For more information about how to use this module, refer to the Sex Check Module.
Module contents¶
Submodules¶
pyGenClean.SexCheck.baf_lrr_plot module¶
-
exception
pyGenClean.SexCheck.baf_lrr_plot.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.SexCheck.baf_lrr_plot.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.SexCheck.baf_lrr_plot.
check_file_names
(samples, raw_dir, options)[source]¶ Check if all files are present.
Parameters: - samples (list of tuples) – a list of tuples with the family ID as first element (str) and sample ID as last element (str).
- raw_dir (str) – the directory containing the raw files.
- options (argparse.Namespace) – the options.
Returns: a dict containing samples as key (a tuple with the family ID as first element and sample ID as last element) and the name of the raw file as element.
-
pyGenClean.SexCheck.baf_lrr_plot.
encode_chromosome
(chromosome)[source]¶ Encodes chromosomes.
Parameters: chromosome (str) – the chromosome to encode. Returns: the encoded chromosome. Encodes the sexual chromosomes, from
23
and24
toX
andY
, respectively.Note
Only the sexual chromosomes are encoded.
>>> encode_chromosome("23") 'X' >>> encode_chromosome("24") 'Y' >>> encode_chromosome("This is not a chromosome") 'This is not a chromosome'
-
pyGenClean.SexCheck.baf_lrr_plot.
main
(argString=None)[source]¶ The main function of this module.
Parameters: argString (list) – the options. These are the steps:
- Prints the options.
- Reads the problematic samples (
read_problematic_samples()
). - Finds and checks the raw files for each of the problematic samples
(
check_file_names()
). - Plots the BAF and LRR (
plot_baf_lrr()
).
-
pyGenClean.SexCheck.baf_lrr_plot.
parseArgs
(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --problematic-samples
string The list of sample with sex problems to plot --use-full-ids
bool Use full sample IDs (famID and indID). --full-ids-delimiter
string The delimiter between famID and indID. --raw-dir
string Directory containing information about every samples (BAF and LRR). --format
string The output file format (png, ps, pdf, or X11). --out
string The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.SexCheck.baf_lrr_plot.
plot_baf_lrr
(file_names, options)[source]¶ Plot BAF and LRR for a list of files.
Parameters: - file_names (dict) – contains the name of the input file for each sample.
- options (argparse.Namespace) – the options.
Plots the BAF (B Allele Frequency) and LRR (Log R Ratio) of each samples. Only the sexual chromosome are shown.
-
pyGenClean.SexCheck.baf_lrr_plot.
read_problematic_samples
(file_name)[source]¶ Reads a file with sample IDs.
Parameters: file_name (str) – the name of the file containing problematic samples after sex check. Returns: a set of problematic samples (tuple containing the family ID as first element and the sample ID as last element). Reads a file containing problematic samples after sex check. The file is provided by the module
pyGenClean.SexCheck.sex_check
. This file contains two columns, the first one being the family ID and the second one, the sample ID.
pyGenClean.SexCheck.gender_plot module¶
-
exception
pyGenClean.SexCheck.gender_plot.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.SexCheck.gender_plot.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.SexCheck.gender_plot.
encode_chr
(chromosome)[source]¶ Encodes chromosomes.
Parameters: chromosome (str) – the chromosome to encode. Returns: the encoded chromosome as int
.It changes
X
,Y
,XY
andMT
to23
,24
,25
and26
, respectively. It changes everything else asint
.If
ValueError
is raised, thenProgramError
is also raised. If a chromosome as a value below 1 or above 26, aProgramError
is raised.>>> [encode_chr(str(i)) for i in range(0, 11)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> [encode_chr(str(i)) for i in range(11, 21)] [11, 12, 13, 14, 15, 16, 17, 18, 19, 20] >>> [encode_chr(str(i)) for i in range(21, 27)] [21, 22, 23, 24, 25, 26] >>> [encode_chr(i) for i in ["X", "Y", "XY", "MT"]] [23, 24, 25, 26] >>> encode_chr("27") Traceback (most recent call last): ... ProgramError: 27: invalid chromosome >>> encode_chr("XX") Traceback (most recent call last): ... ProgramError: XX: invalid chromosome
-
pyGenClean.SexCheck.gender_plot.
encode_gender
(gender)[source]¶ Encodes the gender.
Parameters: gender (str) – the gender to encode. Returns: the encoded gender. It changes
1
and2
toMale
andFemale
respectively. It encodes everything else toUnknown
.>>> encode_gender("1") 'Male' >>> encode_gender("2") 'Female' >>> encode_gender("0") 'Unknown' >>> encode_gender("This is not a gender code") 'Unknown'
-
pyGenClean.SexCheck.gender_plot.
main
(argString=None)[source]¶ The main function of the module.
Parameters: argString (list) – the options. These are the steps:
- Prints the options.
- If there are
summarized_intensities
provided, reads the files (read_summarized_intensities()
) and skips to step 7. - Reads the
bim
file to get markers on the sexual chromosomes (read_bim()
). - Reads the
fam
file to get gender (read_fam()
). - Reads the file containing samples with sex problems
(
read_sex_problems()
). - Reads the intensities and summarizes them (
read_intensities()
). - Plots the summarized intensities (
plot_gender()
).
-
pyGenClean.SexCheck.gender_plot.
parseArgs
(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --bfile
string The plink binary file containing information about markers and individuals. --intensities
string A file containing alleles intensities for each of the markers located on the X and Y chromosome. --summarized-intensities
string The prefix of six files containing summarized chr23 and chr24 intensities. --sex-problems
string The file containing individuals with sex problems. --format
string The output file format (png, ps, pdf, or X11). --xlabel
string The label of the X axis. --ylabel
string The label of the Y axis. --out
string The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.SexCheck.gender_plot.
plot_gender
(data, options)[source]¶ Plots the gender.
Parameters: - data (numpy.recarray) – the data to plot.
- options (argparse.Namespace) – the options.
Plots the summarized intensities of the markers on the Y chromosomes in function of the markers on the X chromosomes, with problematic samples with different colors.
Also uses
print_data_to_file()
to save the data, so that it is faster to rerun the analysis.
-
pyGenClean.SexCheck.gender_plot.
print_data_to_file
(data, file_name)[source]¶ Prints data to file.
Parameters: - data (numpy.recarray) – the data to print.
- file_name (str) – the name of the output file.
-
pyGenClean.SexCheck.gender_plot.
read_bim
(file_name)[source]¶ Reads the BIM file to gather marker names.
Parameters: file_name (str) – the name of the bim
file.Returns: a dict
containing the chromosomal location of each marker on the sexual chromosomes.It uses the
encode_chr()
to encode the chromosomes fromX
andY
to23
and24
, respectively.
-
pyGenClean.SexCheck.gender_plot.
read_fam
(file_name)[source]¶ Reads the FAM file to gather sample names.
Parameters: file_name (str) – the fam
file to read.Returns: a dict
containing the gender of each samples.It uses the
encode_gender()
to encode the gender from1``and ``2
toMale
andFemale
, respectively.
-
pyGenClean.SexCheck.gender_plot.
read_intensities
(file_name, needed_markers_chr, needed_samples_gender, sex_problems)[source]¶ Reads the intensities from a file.
Parameters: Returns: a :py:class`numpy.recarray` containing the following columns (for each of the samples):
sampleID
,chr23
,chr24
,gender
andstatus
.Reads the normalized intensities from a final report. The file must contain the following columns:
SNP Name
,Sample ID
,X
,Y
andChr
. It then keeps only the required markers (those that are on sexual chromosomes (23
or24
), encoding NaN intensities to zero.The final data set contains the following information for each sample:
sampleID
: the sample ID.chr23
: the summarized intensities for chromosome 23.chr24
: the summarized intensities for chromosome 24.gender
: the gender of the sample (Male
orFemale
).status
: the status of the sample (OK
orProblem
).
The summarized intensities for a chromosome (\(S_{chr}\)) is computed using this formula (where \(I_{chr}\) is the set of all marker intensities on chromosome \(chr\)):
\[S_{chr} = \frac{\sum{I_{chr}}}{||I_{chr}||}\]
-
pyGenClean.SexCheck.gender_plot.
read_sex_problems
(file_name)[source]¶ Reads the sex problem file.
Parameters: file_name (str) – the name of the file containing sex problems. Returns: a frozenset
containing samples with sex problem.If there is no
file_name
(i.e. isNone
), then an emptyfrozenset
is returned.
-
pyGenClean.SexCheck.gender_plot.
read_summarized_intensities
(prefix)[source]¶ Reads the summarized intensities from 6 files.
Parameters: prefix (str) – the prefix of the six files. Returns: a :py:class`numpy.recarray` containing the following columns (for each of the samples): sampleID
,chr23
,chr24
,gender
andstatus
.Instead of reading a final report (like
read_intensities()
), this function reads six files previously created by this module to gather sample information. Here are the content of the six files:prefix.ok_females.txt
: information about females without sex problem.prefix.ok_males.txt
: information about males without sex problem.prefix.ok_unknowns.txt
: information about unknown gender without sex- problem.
prefix.problematic_females.txt
: information about females with sex- problem.
prefix.problematic_males.txt
: information about males with sex- problem.
prefix.problematic_unknowns.txt
: information about unknown gender- with sex problem.
Each file contains the following columns:
sampleID
,chr23
,chr24
,gender
andstatus
.The final data set contains the following information for each sample:
sampleID
: the sample ID.chr23
: the summarized intensities for chromosome 23.chr24
: the summarized intensities for chromosome 24.gender
: the gender of the sample (Male
orFemale
).status
: the status of the sample (OK
orProblem
).
The summarized intensities for a chromosome (\(S_{chr}\)) is computed using this formula (where \(I_{chr}\) is the set of all marker intensities on chromosome \(chr\)):
\[S_{chr} = \frac{\sum{I_{chr}}}{||I_{chr}||}\]
pyGenClean.SexCheck.sex_check module¶
-
exception
pyGenClean.SexCheck.sex_check.
ProgramError
(msg)[source]¶ Bases:
exceptions.Exception
An
Exception
raised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.SexCheck.sex_check.
checkArgs
(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: True
if everything was OK.If there is a problem with an option, an exception is raised using the
ProgramError
class, a message is printed to thesys.stderr
and the program exists with code 1.
-
pyGenClean.SexCheck.sex_check.
checkBim
(fileName, minNumber, chromosome)[source]¶ Checks the BIM file for chrN markers.
Parameters: Returns: True
if there are at leastminNumber
markers on chromosomechromosome
,False
otherwise.
-
pyGenClean.SexCheck.sex_check.
computeHeteroPercentage
(fileName)[source]¶ Computes the heterozygosity percentage.
Parameters: fileName (str) – the name of the input file. Reads the
ped
file created by Plink using therecodeA
options (seecreatePedChr23UsingPlink()
) and computes the heterozygosity percentage on the chromosome23
.
-
pyGenClean.SexCheck.sex_check.
computeNoCall
(fileName)[source]¶ Computes the number of no call.
Parameters: fileName (str) – the name of the file Reads the
ped
file created by Plink using therecodeA
options (seecreatePedChr24UsingPlink()
) and computes the number and percentage of no calls on the chromosome24
.
-
pyGenClean.SexCheck.sex_check.
createGenderPlot
(bfile, intensities, problematic_samples, format, out_prefix)[source]¶ Creates the gender plot.
Parameters: Creates the gender plot of the samples using the
pyGenClean.SexCheck.gender_plot
module.
-
pyGenClean.SexCheck.sex_check.
createLrrBafPlot
(raw_dir, problematic_samples, format, dpi, out_prefix)[source]¶ Creates the LRR and BAF plot.
Parameters: Creates the LRR (Log R Ratio) and BAF (B Allele Frequency) of the problematic samples using the
pyGenClean.SexCheck.baf_lrr_plot
module.
-
pyGenClean.SexCheck.sex_check.
createPedChr23UsingPlink
(options)[source]¶ Run Plink to create a ped format.
Parameters: options (argparse.Namespace) – the options. Uses Plink to create a
ped
file of markers on the chromosome23
. It uses therecodeA
options to use additive coding. It also subsets the data to keep only samples with sex problems.
-
pyGenClean.SexCheck.sex_check.
createPedChr24UsingPlink
(options)[source]¶ Run plink to create a ped format.
Parameters: options (argparse.Namespace) – the options. Uses Plink to create a
ped
file of markers on the chromosome24
. It uses therecodeA
options to use additive coding. It also subsets the data to keep only samples with sex problems.
-
pyGenClean.SexCheck.sex_check.
main
(argString=None)[source]¶ The main function of the module.
Parameters: argString (list) – the options. These are the following steps:
- Prints the options.
- Checks if there are enough markers on the chromosome
23
(checkBim()
). If not, quits here. - Runs the sex check analysis using Plink (
runPlinkSexCheck()
). - If there are no sex problems, then quits (
readCheckSexFile()
). - Creates the recoded file for the chromosome
23
(createPedChr23UsingPlink()
). - Computes the heterozygosity percentage on the chromosome
23
(computeHeteroPercentage()
). - If there are enough markers on chromosome
24
(at least 1), creates the recoded file for this chromosome (createPedChr24UsingPlink()
). - Computes the number of no call on the chromosome
24
(computeNoCall()
). - If required, plots the gender plot (
createGenderPlot()
). - If required, plots the BAF and LRR plot (
createLrrBafPlot()
).
-
pyGenClean.SexCheck.sex_check.
parseArgs
(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespace
object created by theargparse
module. It contains the values of the different options.Options Type Description --bfile
string The input file prefix (Plink binary). --femaleF
float The female F threshold. --maleF
float The male F threshold. --nbChr23
int The minimum number of markers on chromosome 23 before computing Plink’s sex check. --gender-plot
bool Create the gender plot. --sex-chr-intensities
string A file containing alleles intensities for each of the markers located on the X and Y chromosome. --gender-plot-format
string The output file format for the gender plot. --lrr-baf
bool Create the LRR and BAF plot. --lrr-baf-raw-dir
string Directory containing information about every samples (BAF and LRR). --lrr-baf-format
string The output file format. --lrr-baf-dpi
int The pixel density of the figure(s) (DPI). --out
string The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()
).
-
pyGenClean.SexCheck.sex_check.
readCheckSexFile
(fileName, allProblemsFileName, idsFileName, femaleF, maleF)[source]¶ Reads the Plink check-sex output file.
Parameters: - fileName (str) – the name of the input file.
- allProblemsFileName (str) – the name of the output file that will contain all the problems.
- idsFileName (str) – the name of the output file what will contain samples with sex problems.
- femaleF (float) – the F threshold for females.
- maleF (float) – the F threshold for males.
Returns: True
if there are sex problems,False
otherwise.Reads sex check file provided by
runPlinkSexCheck()
(Plink) and extract the samples that have sex problems.
-
pyGenClean.SexCheck.sex_check.
runCommand
(command)[source]¶ Run a command.
Parameters: command (list) – the command to run. Tries to run a command. If it fails, raise a
ProgramError
. This function uses thesubprocess
module.Warning
The variable command should be a list of strings (no other type).
-
pyGenClean.SexCheck.sex_check.
runPlinkSexCheck
(options)[source]¶ Runs Plink to perform a sex check analysis.
Parameters: options (argparse.Namespace) – the options. Uses Plink to perform a sex check analysis.