pybgen’s API¶

Main PyBGEN class¶

class pybgen.PyBGEN(fn, mode='r', prob_t=0.9, _skip_index=False, probs_only=False)[source]¶

Reads and store a set of BGEN files.

Parameters

fn (str) – The name of the BGEN file.
mode (str) – The open mode for the BGEN file.
prob_t (float) – The probability threshold (optional).
probs_only (boolean) – Return only the probabilities instead of dosage.

Reads or write BGEN files.

from pybgen import PyBGEN

# Reading a BGEN file
with PyBGEN("bgen_file_name") as bgen:
    pass

close()[source]¶: Closes the BGEN object.

get_specific_variant(chrom, pos, ref, alt)[source]¶

Get specific variant with allele lookup

Parameters

chrom (str) – The name of the chromosome.
pos (int) – The starting position of the region.
ref (str) – The reference allele.
alt (str) – The alternative allele.

Returns

A list containing all the value for a given variant. The list has more than one item if there are duplicated variants.

Return type

list

get_variant(name)[source]¶

Gets the values for a given variant.

Parameters: name (str) – The name of the variant.
Returns: A list containing all the value for a given variant. The list has more than one item if there are duplicated variants.
Return type: list

iter_variant_info()[source]¶: Iterate over marker information.

iter_variants()[source]¶

Iterates over variants from the beginning of the BGEN file.

Returns: A variant and the dosage.
Return type: tuple

iter_variants_by_names(names)[source]¶

Iterates over variants using a list of names.

Parameters: names (list) – A list of names to extract specific variants.

iter_variants_in_region(chrom, start, end)[source]¶

Iterates over variants in a specific region.

Parameters

chrom (str) – The name of the chromosome.
start (int) – The starting position of the region.
end (int) – The ending position of the region.

property nb_samples¶

Returns the number of samples.

Returns: The number of samples in the dataset.
Return type: int

property nb_variants¶

Returns the number of markers.

Returns: The number of markers in the dataset.
Return type: int

next()[source]¶

Returns the next variant.

Returns: The variant’s information and its genotypes (dosage) as numpy.ndarray.
Return type: tuple

property samples¶

Returns the samples.

Returns: The samples.
Return type: tuple

A module to read BGEN files.

pybgen.test(verbosity=1)[source]¶

Executes all the tests for pybgen.

Parameters: verbosity (int) – The verbosity level for unittest.

Just set verbosity to an integer higher than 1 to have more information about the tests.

Parallel PyBGEN class¶

We provide a wrapper class called ParallelPyBGEN which implements two functions to iterate over variants in parallel. This is useful for huge datasets such as the UK Biobank imputation files.

class pybgen.ParallelPyBGEN(fn, prob_t=0.9, cpus=2, probs_only=False, max_variants=1000)[source]¶

Reads BGEN files in parallel.

Parameters

fn (str) – The name of the BGEN file.
prob_t (float) – The probability threshold (optional).
cpus (int) – The number of CPUs (default is 2).
probs_only (boolean) – Return only the probabilities instead of dosage.
max_variants (int) – The maximal number of variants in the Queue

Reads a BGEN file using multiple processes.

from pybgen import ParrallelPyBGEN as PyBGEN

# Reading a BGEN file
with PyBGEN("bgen_file_name") as bgen:
    pass

iter_variants()[source]¶: Iterates over all variants using multiple process.

iter_variants_by_names(names)[source]¶

Iterates over variants using a list of names.

Parameters: names (list) – A list of names to extract specific variants.

pybgen

Navigation

Related Topics

pybgen’s API¶

Main PyBGEN class¶

Parallel PyBGEN class¶