pybgen’s API

Main PyBGEN class

class pybgen.PyBGEN(fn, mode='r', prob_t=0.9, _skip_index=False, probs_only=False)[source]

Reads and store a set of BGEN files.

Parameters
  • fn (str) – The name of the BGEN file.

  • mode (str) – The open mode for the BGEN file.

  • prob_t (float) – The probability threshold (optional).

  • probs_only (boolean) – Return only the probabilities instead of dosage.

Reads or write BGEN files.

from pybgen import PyBGEN

# Reading a BGEN file
with PyBGEN("bgen_file_name") as bgen:
    pass
close()[source]

Closes the BGEN object.

get_specific_variant(chrom, pos, ref, alt)[source]

Get specific variant with allele lookup

Parameters
  • chrom (str) – The name of the chromosome.

  • pos (int) – The starting position of the region.

  • ref (str) – The reference allele.

  • alt (str) – The alternative allele.

Returns

A list containing all the value for a given variant. The list has more than one item if there are duplicated variants.

Return type

list

get_variant(name)[source]

Gets the values for a given variant.

Parameters

name (str) – The name of the variant.

Returns

A list containing all the value for a given variant. The list has more than one item if there are duplicated variants.

Return type

list

iter_variant_info()[source]

Iterate over marker information.

iter_variants()[source]

Iterates over variants from the beginning of the BGEN file.

Returns

A variant and the dosage.

Return type

tuple

iter_variants_by_names(names)[source]

Iterates over variants using a list of names.

Parameters

names (list) – A list of names to extract specific variants.

iter_variants_in_region(chrom, start, end)[source]

Iterates over variants in a specific region.

Parameters
  • chrom (str) – The name of the chromosome.

  • start (int) – The starting position of the region.

  • end (int) – The ending position of the region.

property nb_samples

Returns the number of samples.

Returns

The number of samples in the dataset.

Return type

int

property nb_variants

Returns the number of markers.

Returns

The number of markers in the dataset.

Return type

int

next()[source]

Returns the next variant.

Returns

The variant’s information and its genotypes (dosage) as numpy.ndarray.

Return type

tuple

property samples

Returns the samples.

Returns

The samples.

Return type

tuple

A module to read BGEN files.

pybgen.test(verbosity=1)[source]

Executes all the tests for pybgen.

Parameters

verbosity (int) – The verbosity level for unittest.

Just set verbosity to an integer higher than 1 to have more information about the tests.

Parallel PyBGEN class

We provide a wrapper class called ParallelPyBGEN which implements two functions to iterate over variants in parallel. This is useful for huge datasets such as the UK Biobank imputation files.

class pybgen.ParallelPyBGEN(fn, prob_t=0.9, cpus=2, probs_only=False, max_variants=1000)[source]

Reads BGEN files in parallel.

Parameters
  • fn (str) – The name of the BGEN file.

  • prob_t (float) – The probability threshold (optional).

  • cpus (int) – The number of CPUs (default is 2).

  • probs_only (boolean) – Return only the probabilities instead of dosage.

  • max_variants (int) – The maximal number of variants in the Queue

Reads a BGEN file using multiple processes.

from pybgen import ParrallelPyBGEN as PyBGEN

# Reading a BGEN file
with PyBGEN("bgen_file_name") as bgen:
    pass
iter_variants()[source]

Iterates over all variants using multiple process.

iter_variants_by_names(names)[source]

Iterates over variants using a list of names.

Parameters

names (list) – A list of names to extract specific variants.