pyGenClean.SexCheck package¶
For more information about how to use this module, refer to the Sex Check Module.
Module contents¶
Submodules¶
pyGenClean.SexCheck.baf_lrr_plot module¶
-
exception
pyGenClean.SexCheck.baf_lrr_plot.ProgramError(msg)[source]¶ Bases:
exceptions.ExceptionAn
Exceptionraised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.SexCheck.baf_lrr_plot.checkArgs(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: Trueif everything was OK.If there is a problem with an option, an exception is raised using the
ProgramErrorclass, a message is printed to thesys.stderrand the program exists with code 1.
-
pyGenClean.SexCheck.baf_lrr_plot.check_file_names(samples, raw_dir, options)[source]¶ Check if all files are present.
Parameters: - samples (list of tuples) – a list of tuples with the family ID as first element (str) and sample ID as last element (str).
- raw_dir (str) – the directory containing the raw files.
- options (argparse.Namespace) – the options.
Returns: a dict containing samples as key (a tuple with the family ID as first element and sample ID as last element) and the name of the raw file as element.
-
pyGenClean.SexCheck.baf_lrr_plot.encode_chromosome(chromosome)[source]¶ Encodes chromosomes.
Parameters: chromosome (str) – the chromosome to encode. Returns: the encoded chromosome. Encodes the sexual chromosomes, from
23and24toXandY, respectively.Note
Only the sexual chromosomes are encoded.
>>> encode_chromosome("23") 'X' >>> encode_chromosome("24") 'Y' >>> encode_chromosome("This is not a chromosome") 'This is not a chromosome'
-
pyGenClean.SexCheck.baf_lrr_plot.main(argString=None)[source]¶ The main function of this module.
Parameters: argString (list) – the options. These are the steps:
- Prints the options.
- Reads the problematic samples (
read_problematic_samples()). - Finds and checks the raw files for each of the problematic samples
(
check_file_names()). - Plots the BAF and LRR (
plot_baf_lrr()).
-
pyGenClean.SexCheck.baf_lrr_plot.parseArgs(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespaceobject created by theargparsemodule. It contains the values of the different options.Options Type Description --problematic-samplesstring The list of sample with sex problems to plot --use-full-idsbool Use full sample IDs (famID and indID). --full-ids-delimiterstring The delimiter between famID and indID. --raw-dirstring Directory containing information about every samples (BAF and LRR). --formatstring The output file format (png, ps, pdf, or X11). --outstring The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()).
-
pyGenClean.SexCheck.baf_lrr_plot.plot_baf_lrr(file_names, options)[source]¶ Plot BAF and LRR for a list of files.
Parameters: - file_names (dict) – contains the name of the input file for each sample.
- options (argparse.Namespace) – the options.
Plots the BAF (B Allele Frequency) and LRR (Log R Ratio) of each samples. Only the sexual chromosome are shown.
-
pyGenClean.SexCheck.baf_lrr_plot.read_problematic_samples(file_name)[source]¶ Reads a file with sample IDs.
Parameters: file_name (str) – the name of the file containing problematic samples after sex check. Returns: a set of problematic samples (tuple containing the family ID as first element and the sample ID as last element). Reads a file containing problematic samples after sex check. The file is provided by the module
pyGenClean.SexCheck.sex_check. This file contains two columns, the first one being the family ID and the second one, the sample ID.
pyGenClean.SexCheck.gender_plot module¶
-
exception
pyGenClean.SexCheck.gender_plot.ProgramError(msg)[source]¶ Bases:
exceptions.ExceptionAn
Exceptionraised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.SexCheck.gender_plot.checkArgs(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: Trueif everything was OK.If there is a problem with an option, an exception is raised using the
ProgramErrorclass, a message is printed to thesys.stderrand the program exists with code 1.
-
pyGenClean.SexCheck.gender_plot.encode_chr(chromosome)[source]¶ Encodes chromosomes.
Parameters: chromosome (str) – the chromosome to encode. Returns: the encoded chromosome as int.It changes
X,Y,XYandMTto23,24,25and26, respectively. It changes everything else asint.If
ValueErroris raised, thenProgramErroris also raised. If a chromosome as a value below 1 or above 26, aProgramErroris raised.>>> [encode_chr(str(i)) for i in range(0, 11)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> [encode_chr(str(i)) for i in range(11, 21)] [11, 12, 13, 14, 15, 16, 17, 18, 19, 20] >>> [encode_chr(str(i)) for i in range(21, 27)] [21, 22, 23, 24, 25, 26] >>> [encode_chr(i) for i in ["X", "Y", "XY", "MT"]] [23, 24, 25, 26] >>> encode_chr("27") Traceback (most recent call last): ... ProgramError: 27: invalid chromosome >>> encode_chr("XX") Traceback (most recent call last): ... ProgramError: XX: invalid chromosome
-
pyGenClean.SexCheck.gender_plot.encode_gender(gender)[source]¶ Encodes the gender.
Parameters: gender (str) – the gender to encode. Returns: the encoded gender. It changes
1and2toMaleandFemalerespectively. It encodes everything else toUnknown.>>> encode_gender("1") 'Male' >>> encode_gender("2") 'Female' >>> encode_gender("0") 'Unknown' >>> encode_gender("This is not a gender code") 'Unknown'
-
pyGenClean.SexCheck.gender_plot.main(argString=None)[source]¶ The main function of the module.
Parameters: argString (list) – the options. These are the steps:
- Prints the options.
- If there are
summarized_intensitiesprovided, reads the files (read_summarized_intensities()) and skips to step 7. - Reads the
bimfile to get markers on the sexual chromosomes (read_bim()). - Reads the
famfile to get gender (read_fam()). - Reads the file containing samples with sex problems
(
read_sex_problems()). - Reads the intensities and summarizes them (
read_intensities()). - Plots the summarized intensities (
plot_gender()).
-
pyGenClean.SexCheck.gender_plot.parseArgs(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespaceobject created by theargparsemodule. It contains the values of the different options.Options Type Description --bfilestring The plink binary file containing information about markers and individuals. --intensitiesstring A file containing alleles intensities for each of the markers located on the X and Y chromosome. --summarized-intensitiesstring The prefix of six files containing summarized chr23 and chr24 intensities. --sex-problemsstring The file containing individuals with sex problems. --formatstring The output file format (png, ps, pdf, or X11). --xlabelstring The label of the X axis. --ylabelstring The label of the Y axis. --outstring The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()).
-
pyGenClean.SexCheck.gender_plot.plot_gender(data, options)[source]¶ Plots the gender.
Parameters: - data (numpy.recarray) – the data to plot.
- options (argparse.Namespace) – the options.
Plots the summarized intensities of the markers on the Y chromosomes in function of the markers on the X chromosomes, with problematic samples with different colors.
Also uses
print_data_to_file()to save the data, so that it is faster to rerun the analysis.
-
pyGenClean.SexCheck.gender_plot.print_data_to_file(data, file_name)[source]¶ Prints data to file.
Parameters: - data (numpy.recarray) – the data to print.
- file_name (str) – the name of the output file.
-
pyGenClean.SexCheck.gender_plot.read_bim(file_name)[source]¶ Reads the BIM file to gather marker names.
Parameters: file_name (str) – the name of the bimfile.Returns: a dictcontaining the chromosomal location of each marker on the sexual chromosomes.It uses the
encode_chr()to encode the chromosomes fromXandYto23and24, respectively.
-
pyGenClean.SexCheck.gender_plot.read_fam(file_name)[source]¶ Reads the FAM file to gather sample names.
Parameters: file_name (str) – the famfile to read.Returns: a dictcontaining the gender of each samples.It uses the
encode_gender()to encode the gender from1``and ``2toMaleandFemale, respectively.
-
pyGenClean.SexCheck.gender_plot.read_intensities(file_name, needed_markers_chr, needed_samples_gender, sex_problems)[source]¶ Reads the intensities from a file.
Parameters: Returns: a :py:class`numpy.recarray` containing the following columns (for each of the samples):
sampleID,chr23,chr24,genderandstatus.Reads the normalized intensities from a final report. The file must contain the following columns:
SNP Name,Sample ID,X,YandChr. It then keeps only the required markers (those that are on sexual chromosomes (23or24), encoding NaN intensities to zero.The final data set contains the following information for each sample:
sampleID: the sample ID.chr23: the summarized intensities for chromosome 23.chr24: the summarized intensities for chromosome 24.gender: the gender of the sample (MaleorFemale).status: the status of the sample (OKorProblem).
The summarized intensities for a chromosome (\(S_{chr}\)) is computed using this formula (where \(I_{chr}\) is the set of all marker intensities on chromosome \(chr\)):
\[S_{chr} = \frac{\sum{I_{chr}}}{||I_{chr}||}\]
-
pyGenClean.SexCheck.gender_plot.read_sex_problems(file_name)[source]¶ Reads the sex problem file.
Parameters: file_name (str) – the name of the file containing sex problems. Returns: a frozensetcontaining samples with sex problem.If there is no
file_name(i.e. isNone), then an emptyfrozensetis returned.
-
pyGenClean.SexCheck.gender_plot.read_summarized_intensities(prefix)[source]¶ Reads the summarized intensities from 6 files.
Parameters: prefix (str) – the prefix of the six files. Returns: a :py:class`numpy.recarray` containing the following columns (for each of the samples): sampleID,chr23,chr24,genderandstatus.Instead of reading a final report (like
read_intensities()), this function reads six files previously created by this module to gather sample information. Here are the content of the six files:prefix.ok_females.txt: information about females without sex problem.prefix.ok_males.txt: information about males without sex problem.prefix.ok_unknowns.txt: information about unknown gender without sex- problem.
prefix.problematic_females.txt: information about females with sex- problem.
prefix.problematic_males.txt: information about males with sex- problem.
prefix.problematic_unknowns.txt: information about unknown gender- with sex problem.
Each file contains the following columns:
sampleID,chr23,chr24,genderandstatus.The final data set contains the following information for each sample:
sampleID: the sample ID.chr23: the summarized intensities for chromosome 23.chr24: the summarized intensities for chromosome 24.gender: the gender of the sample (MaleorFemale).status: the status of the sample (OKorProblem).
The summarized intensities for a chromosome (\(S_{chr}\)) is computed using this formula (where \(I_{chr}\) is the set of all marker intensities on chromosome \(chr\)):
\[S_{chr} = \frac{\sum{I_{chr}}}{||I_{chr}||}\]
pyGenClean.SexCheck.sex_check module¶
-
exception
pyGenClean.SexCheck.sex_check.ProgramError(msg)[source]¶ Bases:
exceptions.ExceptionAn
Exceptionraised in case of a problem.Parameters: msg (str) – the message to print to the user before exiting.
-
pyGenClean.SexCheck.sex_check.checkArgs(args)[source]¶ Checks the arguments and options.
Parameters: args (argparse.Namespace) – an object containing the options of the program. Returns: Trueif everything was OK.If there is a problem with an option, an exception is raised using the
ProgramErrorclass, a message is printed to thesys.stderrand the program exists with code 1.
-
pyGenClean.SexCheck.sex_check.checkBim(fileName, minNumber, chromosome)[source]¶ Checks the BIM file for chrN markers.
Parameters: Returns: Trueif there are at leastminNumbermarkers on chromosomechromosome,Falseotherwise.
-
pyGenClean.SexCheck.sex_check.computeHeteroPercentage(fileName)[source]¶ Computes the heterozygosity percentage.
Parameters: fileName (str) – the name of the input file. Reads the
pedfile created by Plink using therecodeAoptions (seecreatePedChr23UsingPlink()) and computes the heterozygosity percentage on the chromosome23.
-
pyGenClean.SexCheck.sex_check.computeNoCall(fileName)[source]¶ Computes the number of no call.
Parameters: fileName (str) – the name of the file Reads the
pedfile created by Plink using therecodeAoptions (seecreatePedChr24UsingPlink()) and computes the number and percentage of no calls on the chromosome24.
-
pyGenClean.SexCheck.sex_check.createGenderPlot(bfile, intensities, problematic_samples, format, out_prefix)[source]¶ Creates the gender plot.
Parameters: Creates the gender plot of the samples using the
pyGenClean.SexCheck.gender_plotmodule.
-
pyGenClean.SexCheck.sex_check.createLrrBafPlot(raw_dir, problematic_samples, format, dpi, out_prefix)[source]¶ Creates the LRR and BAF plot.
Parameters: Creates the LRR (Log R Ratio) and BAF (B Allele Frequency) of the problematic samples using the
pyGenClean.SexCheck.baf_lrr_plotmodule.
-
pyGenClean.SexCheck.sex_check.createPedChr23UsingPlink(options)[source]¶ Run Plink to create a ped format.
Parameters: options (argparse.Namespace) – the options. Uses Plink to create a
pedfile of markers on the chromosome23. It uses therecodeAoptions to use additive coding. It also subsets the data to keep only samples with sex problems.
-
pyGenClean.SexCheck.sex_check.createPedChr24UsingPlink(options)[source]¶ Run plink to create a ped format.
Parameters: options (argparse.Namespace) – the options. Uses Plink to create a
pedfile of markers on the chromosome24. It uses therecodeAoptions to use additive coding. It also subsets the data to keep only samples with sex problems.
-
pyGenClean.SexCheck.sex_check.main(argString=None)[source]¶ The main function of the module.
Parameters: argString (list) – the options. These are the following steps:
- Prints the options.
- Checks if there are enough markers on the chromosome
23(checkBim()). If not, quits here. - Runs the sex check analysis using Plink (
runPlinkSexCheck()). - If there are no sex problems, then quits (
readCheckSexFile()). - Creates the recoded file for the chromosome
23(createPedChr23UsingPlink()). - Computes the heterozygosity percentage on the chromosome
23(computeHeteroPercentage()). - If there are enough markers on chromosome
24(at least 1), creates the recoded file for this chromosome (createPedChr24UsingPlink()). - Computes the number of no call on the chromosome
24(computeNoCall()). - If required, plots the gender plot (
createGenderPlot()). - If required, plots the BAF and LRR plot (
createLrrBafPlot()).
-
pyGenClean.SexCheck.sex_check.parseArgs(argString=None)[source]¶ Parses the command line options and arguments.
Parameters: argString (list) – the options. Returns: A argparse.Namespaceobject created by theargparsemodule. It contains the values of the different options.Options Type Description --bfilestring The input file prefix (Plink binary). --femaleFfloat The female F threshold. --maleFfloat The male F threshold. --nbChr23int The minimum number of markers on chromosome 23 before computing Plink’s sex check. --gender-plotbool Create the gender plot. --sex-chr-intensitiesstring A file containing alleles intensities for each of the markers located on the X and Y chromosome. --gender-plot-formatstring The output file format for the gender plot. --lrr-bafbool Create the LRR and BAF plot. --lrr-baf-raw-dirstring Directory containing information about every samples (BAF and LRR). --lrr-baf-formatstring The output file format. --lrr-baf-dpiint The pixel density of the figure(s) (DPI). --outstring The prefix of the output files. Note
No option check is done here (except for the one automatically done by argparse). Those need to be done elsewhere (see
checkArgs()).
-
pyGenClean.SexCheck.sex_check.readCheckSexFile(fileName, allProblemsFileName, idsFileName, femaleF, maleF)[source]¶ Reads the Plink check-sex output file.
Parameters: - fileName (str) – the name of the input file.
- allProblemsFileName (str) – the name of the output file that will contain all the problems.
- idsFileName (str) – the name of the output file what will contain samples with sex problems.
- femaleF (float) – the F threshold for females.
- maleF (float) – the F threshold for males.
Returns: Trueif there are sex problems,Falseotherwise.Reads sex check file provided by
runPlinkSexCheck()(Plink) and extract the samples that have sex problems.
-
pyGenClean.SexCheck.sex_check.runCommand(command)[source]¶ Run a command.
Parameters: command (list) – the command to run. Tries to run a command. If it fails, raise a
ProgramError. This function uses thesubprocessmodule.Warning
The variable command should be a list of strings (no other type).
-
pyGenClean.SexCheck.sex_check.runPlinkSexCheck(options)[source]¶ Runs Plink to perform a sex check analysis.
Parameters: options (argparse.Namespace) – the options. Uses Plink to perform a sex check analysis.
