Genotype

class hail.representation.Genotype(gt, ad=None, dp=None, gq=None, pl=None)[source]

An object that represents an individual’s genotype at a genomic locus.

Parameters:
  • gt (int or None) – Genotype hard call
  • ad (list of int or None) – allelic depth (1 element per allele including reference)
  • dp (int or None) – total depth
  • gq (int or None) – genotype quality
  • pl (list of int or None) – phred-scaled posterior genotype likelihoods (1 element per possible genotype)

Attributes

ad Returns the allelic depth.
dp Returns the total depth.
gp Returns the linear-scaled genotype probabilities.
gq Returns the phred-scaled genotype quality.
gt Returns the hard genotype call.
pl Returns the phred-scaled genotype posterior likelihoods.

Methods

__init__ Initialize a Genotype object.
dosage Returns the expected value of the genotype based on genotype probabilities, \(\mathrm{P}(\mathrm{Het}) + 2 \mathrm{P}(\mathrm{HomVar})\).
fraction_reads_ref Returns the fraction of reads that are reference reads.
is_called True if the genotype call is non-missing.
is_called_non_ref True if the genotype call contains any non-reference alleles.
is_het True if the genotype call contains two different alleles.
is_het_non_ref True if the genotype call contains two different alternate alleles.
is_het_ref True if the genotype call contains one reference and one alternate allele.
is_hom_ref True if the genotype call is 0/0
is_hom_var True if the genotype call contains two identical alternate alleles.
is_not_called True if the genotype call is missing.
num_alt_alleles Returns the count of non-reference alleles.
od Returns the difference between the total depth and the allelic depth sum.
one_hot_alleles Returns a list containing the one-hot encoded representation of the called alleles.
one_hot_genotype Returns a list containing the one-hot encoded representation of the genotype call.
p_ab Returns the p-value associated with finding the given allele depth ratio.
ad

Returns the allelic depth.

Return type:list of int or None
dosage()[source]

Returns the expected value of the genotype based on genotype probabilities, \(\mathrm{P}(\mathrm{Het}) + 2 \mathrm{P}(\mathrm{HomVar})\). Genotype must be bi-allelic.

Return type:float
dp

Returns the total depth.

Return type:int or None
fraction_reads_ref()[source]

Returns the fraction of reads that are reference reads.

Equivalent to:

>>> g.ad[0] / sum(g.ad)
Return type:float or None
gp

Returns the linear-scaled genotype probabilities.

Return type:list of float of None
gq

Returns the phred-scaled genotype quality.

Returns:int or None
gt

Returns the hard genotype call.

Return type:int or None
is_called()[source]

True if the genotype call is non-missing.

Return type:bool
is_called_non_ref()[source]

True if the genotype call contains any non-reference alleles.

Return type:bool
is_het()[source]

True if the genotype call contains two different alleles.

Return type:bool
is_het_non_ref()[source]

True if the genotype call contains two different alternate alleles.

Return type:bool
is_het_ref()[source]

True if the genotype call contains one reference and one alternate allele.

Return type:bool
is_hom_ref()[source]

True if the genotype call is 0/0

Return type:bool
is_hom_var()[source]

True if the genotype call contains two identical alternate alleles.

Return type:bool
is_not_called()[source]

True if the genotype call is missing.

Return type:bool
num_alt_alleles()[source]

Returns the count of non-reference alleles.

This function returns None if the genotype call is missing.

Return type:int or None
od()[source]

Returns the difference between the total depth and the allelic depth sum.

Equivalent to:

g.dp - sum(g.ad)
Return type:int or None
one_hot_alleles(num_alleles)[source]

Returns a list containing the one-hot encoded representation of the called alleles.

This one-hot representation is the positional sum of the one-hot encoding for each called allele. For a biallelic variant, the one-hot encoding for a reference allele is [1, 0] and the one-hot encoding for an alternate allele is [0, 1]. Thus, with the following variables:

num_alleles = 2
hom_ref = Genotype(0)
het = Genotype(1)
hom_var = Genotype(2)

All the below statements are true:

hom_ref.one_hot_alleles(num_alleles) == [2, 0]
het.one_hot_alleles(num_alleles) == [1, 1]
hom_var.one_hot_alleles(num_alleles) == [0, 2]

This function returns None if the genotype call is missing.

Parameters:num_alleles (int) – number of possible alternate alleles
Return type:list of int or None
one_hot_genotype(num_genotypes)[source]

Returns a list containing the one-hot encoded representation of the genotype call.

A one-hot encoding is a vector with one ‘1’ and many ‘0’ values, like [0, 0, 1, 0] or [1, 0, 0, 0]. This function is useful for transforming the genotype call (gt) into a one-hot encoded array. With the following variables:

num_genotypes = 3
hom_ref = Genotype(0)
het = Genotype(1)
hom_var = Genotype(2)

All the below statements are true:

hom_ref.one_hot_genotype(num_genotypes) == [1, 0, 0]
het.one_hot_genotype(num_genotypes) == [0, 1, 0]
hom_var.one_hot_genotype(num_genotypes) == [0, 0, 1]

This function returns None if the genotype call is missing.

Parameters:num_genotypes (int) – number of possible genotypes
Return type:list of int or None
p_ab(theta=0.5)[source]

Returns the p-value associated with finding the given allele depth ratio.

This function uses a one-tailed binomial test.

This function returns None if the allelic depth (ad) is missing.

Parameters:theta (float) – null reference probability for binomial model
Return type:float
pl

Returns the phred-scaled genotype posterior likelihoods.

Return type:list of int or None