# Functions¶

• binomTest(x: Int, n: Int, p: Double, alternative: String): Double

Returns the p-value from the exact binomial test of the null hypothesis that success has probability p, given x successes in n trials.

Examples

Test each variant for allele balance across all heterozygous genotypes, under the null hypothesis that the two alleles are sampled with equal probability.

>>> (vds.split_multi()
...   .annotate_variants_expr(
...   'va.ab_binom_test = let all_samples_ad = gs.filter(g => g.isHet).map(g => g.ad).sum() in '


Arguments

• x (Int) – Number of successes
• n (Int) – Number of trials
• p (Double) – Probability of success under the null hypothesis
• alternative (String) – Alternative hypothesis, must be “two.sided”, “greater” or “less”.
• chisq(c1: Int, c2: Int, c3: Int, c4: Int): Struct{pValue:Double,oddsRatio:Double}

• pValue (Double) – p-value
• oddsRatio (Double) – odds ratio

Calculates p-value (Chi-square approximation) and odds ratio for 2x2 table

Arguments

• c1 (Int) – value for cell 1
• c2 (Int) – value for cell 2
• c3 (Int) – value for cell 3
• c4 (Int) – value for cell 4
• combineVariants(left: Variant, right: Variant): Struct{variant:Variant,laIndices:Dict[Int,Int],raIndices:Dict[Int,Int]}

• variant (Variant) – Resulting combined variant.
• laIndices (Dict[Int, Int]) – Mapping from new to old allele index for the left variant.
• raIndices (Dict[Int, Int]) – Mapping from new to old allele index for the right variant.

Combines the alleles of two variants at the same locus, making sure that ref and alt alleles are represented uniformely. In addition to the resulting variant containing all alleles, this function also returns the mapping from the old to the new allele indices. Note that this mapping counts the reference allele always contains the reference allele mapping 0 -> 0.

let left = Variant("1:1000:AT:A,CT") and right = Variant("1:1000:A:C,AGG") in combineVariants(left,right)
result: Struct{'variant': Variant(contig=1, start=1000, ref=AT, alts=[AltAllele(ref=AT, alt=A), AltAllele(ref=AT, alt=CT), AltAllele(ref=AT, alt=AGGT)]), 'laIndices': {0: 0, 1: 1, 2: 2}, 'raIndices': {0:0, 2: 1, 3: 2}}


Arguments

• left (Variant) – Left variant to combine.
• right (Variant) – Right variant to combine.
• ctt(c1: Int, c2: Int, c3: Int, c4: Int, minCellCount: Int): Struct{pValue:Double,oddsRatio:Double}

• pValue (Double) – p-value
• oddsRatio (Double) – odds ratio

Calculates p-value and odds ratio for 2x2 table. If any cell is lower than minCellCount Fishers exact test is used, otherwise faster chi-squared approximation is used.

Arguments

• c1 (Int) – value for cell 1
• c2 (Int) – value for cell 2
• c3 (Int) – value for cell 3
• c4 (Int) – value for cell 4
• minCellCount (Int) – Minimum cell count for using chi-squared approximation
• Dict(keys: Array[T], values: Array[U]): Dict[T, U]

Construct a Dict from an array of keys and an array of values.

Arguments

• keys (Array[T]) – Keys of Dict.
• values (Array[U]) – Values of Dict.
• dpois(x: Double, lambda: Double, logP: Boolean): Double

Returns Prob($$X$$ = x) from a Poisson distribution with rate parameter lambda.

Arguments

• x (Double) – Non-negative number at which to compute the probability density.
• lambda (Double) – Poisson rate parameter. Must be non-negative.
• logP (Boolean) – If true, probabilities are returned as log(p).
• dpois(x: Double, lambda: Double): Double

Returns Prob($$X$$ = x) from a Poisson distribution with rate parameter lambda.

Arguments

• x (Double) – Non-negative number at which to compute the probability density.
• lambda (Double) – Poisson rate parameter. Must be non-negative.
• drop(s: Struct, identifiers: String*): Struct

Return a new Struct with the a subset of fields not matching identifiers.

let s = {gene: "ACBD", function: "LOF", nHet: 12} in drop(s, gene, function)
result: {nHet: 12}


Arguments

• s (Struct) – Struct to drop fields from.
• identifiers (String*) – Field names to drop from s. Multiple arguments allowed.
• exp(x: Double): Double

Returns Euler’s number e raised to the power of the given value x.

Arguments

• x (Double) – the exponent to raise e to.
• fet(a: Int, b: Int, c: Int, d: Int): Struct{pValue:Double,oddsRatio:Double,ci95Lower:Double,ci95Upper:Double}

• pValue (Double) – p-value
• oddsRatio (Double) – odds ratio
• ci95Lower (Double) – lower bound for 95% confidence interval
• ci95Upper (Double) – upper bound for 95% confidence interval

Calculates the p-value, odds ratio, and 95% confidence interval with Fisher’s exact test (FET) for 2x2 tables.

Examples

Annotate each variant with Fisher’s exact test association results (assumes minor/major allele count variant annotations have been computed):

>>> (vds.annotate_variants_expr(
...   'va.fet = let macCase = gs.filter(g => sa.pheno.isCase).map(g => g.nNonRefAlleles()).sum() and '
...   'macControl = gs.filter(g => !sa.pheno.isCase).map(g => g.nNonRefAlleles()).sum() and '
...   'majCase = gs.filter(g => sa.pheno.isCase).map(g => 2 - g.nNonRefAlleles()).sum() and '
...   'majControl = gs.filter(g => !sa.pheno.isCase).map(g => 2 - g.nNonRefAlleles()).sum() in '
...   'fet(macCase, macControl, majCase, majControl)'))


Notes

fet is identical to the version implemented in R with default parameters (two-sided, alpha = 0.05, null hypothesis that the odds ratio equals 1).

Arguments

• a (Int) – value for cell 1
• b (Int) – value for cell 2
• c (Int) – value for cell 3
• d (Int) – value for cell 4
• filtering_allele_frequency(ac: Int, an: Int, ci: Double): Double

Computes a filtering allele frequency (described below) for ac and an with confidence ci.

The filtering allele frequency is the highest true population allele frequency for which the upper bound of the ci (confidence interval) of allele count under a Poisson distribution is still less than the variant’s observed ac (allele count) in the reference sample, given an an (allele number).

This function defines a “filtering AF” that represents the threshold disease-specific “maximum credible AF” at or below which the disease could not plausibly be caused by that variant. A variant with a filtering AF >= the maximum credible AF for the disease under consideration should be filtered, while a variant with a filtering AF below the maximum credible remains a candidate. This filtering AF is not disease-specific: it can be applied to any disease of interest by comparing with a user-defined disease-specific maximum credible AF.

For more details, see: Whiffin et al., 2017

Arguments

• ac (Int) – Allele count
• an (Int) – Allele number
• ci (Double) – Confidence interval. Should be between 0.0 and 1.0.
• gamma(x: Double): Double

Returns the value of the gamma function at x.

Arguments

• x (Double) – the input to gamma.
• Genotype(v: Variant, c: Int, ad: Array[Int], dp: Int, gq: Int, pl: Array[Int]): Genotype

Construct a Genotype object by specifying the variant, call, allelic depths, depth, genotype quality, and phred-scaled likelihoods.

let v = Variant("7:76324539:A:G") and call = Call(0) and
ad = [10, 0] and dp = 10 and gq = 20 and pl = [0, 10, 100] and
g = Genotype(v, call, ad, dp, gq, pl) in g.isHomRef()
result: true


Arguments

• v (Variant) – Variant
• c (Int) – Call
• ad (Array[Int]) – Allelic depths
• dp (Int) – Depth
• gq (Int) – Genotype quality
• pl (Array[Int]) – Phred-scaled likelihoods
• Genotype(v: Variant, gt: Int, prob: Array[Double]): Genotype

Construct a Genotype object from a variant, a genotype call, and an array of genotype probabilities.

let v = Variant("7:76324539:A:G") and gt = 0 and prob = [0.8, 0.1, 0.1] and
g = Genotype(v, gt, prob) in g.isHomRef()
result: true


Arguments

• v (Variant) – Variant
• gt (Int) – Genotype call integer
• prob (Array[Double]) – Genotype probabilities
• Genotype(v: Variant, prob: Array[Double]): Genotype

Construct a Genotype object from a variant and an array of genotype probabilities.

let v = Variant("7:76324539:A:G") and prob = [0.2, 0.7, 0.1] and
g = Genotype(v, prob) in g.isHet()
result: true


Arguments

• v (Variant) – Variant
• prob (Array[Double]) – Genotype probabilities
• gtIndex(j: Int, k: Int): Int

Convert from j/k pair to genotype index (triangular numbers).

Arguments

• j (Int) – j in j/k pairs.
• k (Int) – k in j/k pairs.
• gtj(i: Int): Int

Convert from genotype index (triangular numbers) to j/k pairs. Returns j.

Arguments

• i (Int) – Genotype index.
• gtk(k: Int): Int

Convert from genotype index (triangular numbers) to j/k pairs. Returns k.

Arguments

• k (Int) – Genotype index.
• hwe(nHomRef: Int, nHet: Int, nHomVar: Int): Struct{rExpectedHetFrequency:Double,pHWE:Double}

• rExpectedHetFrequency (Double) – Expected rHeterozygosity based on Hardy Weinberg Equilibrium
• pHWE (Double) – P-value

Compute Hardy Weinberg Equilbrium (HWE) p-value.

Examples

Compute HWE p-value per variant:

>>> (vds.annotate_variants_expr('va.hwe = '
...     'let nHomRef = gs.filter(g => g.isHomRef()).count().toInt() and '
...     'nHet = gs.filter(g => g.isHet()).count().toInt() and '
...     'nHomVar = gs.filter(g => g.isHomVar()).count().toInt() in '
...     'hwe(nHomRef, nHet, nHomVar)'))


Notes

Hail computes the exact p-value with mid-p-value correction, i.e. the probability of a less-likely outcome plus one-half the probability of an equally-likely outcome. See this document for details on the Levene-Haldane distribution and references.

Arguments

• nHomRef (Int) – Number of samples that are homozygous for the reference allele.
• nHet (Int) – Number of samples that are heterozygotes.
• nHomVar (Int) – Number of samples that are homozygous for the alternate allele.
• index(structs: Array[Struct], identifier: String): Dict[String, Struct]

Returns a Dict keyed by the string field identifier of each Struct in the Array and values that are structs with the remaining fields.

let a = [{PLI: 0.998, genename: "gene1", hits_in_exac: 1},
{PLI: 0.0015, genename: "gene2", hits_in_exac: 10},
{PLI: 0.9045, genename: "gene3", hits_in_exac: 2}] and
d = index(a, genename) in global.gene_dict["gene1"]

result: {PLI: 0.998, hits_in_exac: 1}

• Interval(chr: String, start: Int, end: Int): Interval

Constructs an interval from a given chromosome, start, and end.

Arguments

• chr (String) – Chromosome.
• start (Int) – Starting position.
• end (Int) – Ending position (exclusive).
• Interval(s: String): Interval

Returns an interval parsed in the same way as parse()

Arguments

• s (String) – The string to parse.
• Interval(startLocus: Locus, endLocus: Locus): Interval

Construct a Interval object. Intervals are left inclusive, right exclusive. This means that [chr1:1, chr1:3) contains chr1:1 and chr1:2.

Arguments

• startLocus (Locus) – Start position of interval
• endLocus (Locus) – End position of interval
• isDefined(a: T): Boolean – Returns true if item is non-missing. Otherwise, false.

• isMissing(a: T): Boolean – Returns true if item is missing. Otherwise, false.

• isnan(a: Double): Boolean – Returns true if the argument is NaN (not a number), false if the argument is defined but not NaN. Returns missing if the argument is missing.

• json(x: T): String – Returns the JSON representation of a data type.

• Locus(contig: String, pos: Int): Locus

Construct a Locus object.

let l = Locus("1", 10040532) in l.position
result: 10040532


Arguments

• contig (String) – String representation of contig.
• pos (Int) – SNP position or start of an indel.
• Locus(s: String): Locus

Construct a Locus object.

let l = Locus("1:10040532") in l.position
result: 10040532


Arguments

• s (String) – String of the form CHR:POS
• log(x: Double, b: Double): Double

Returns the base b logarithm of the given value x.

Arguments

• x (Double) – the number to take the base b logarithm of.
• b (Double) – the base.
• log(x: Double): Double

Returns the natural logarithm of the given value x.

Arguments

• x (Double) – the number to take the natural logarithm of.
• log10(x: Double): Double

Returns the base 10 logarithm of the given value x.

Arguments

• x (Double) – the number to take the base 10 logarithm of.
• merge(s1: Struct, s2: Struct): Struct

Create a new Struct with all fields in s1 and s2.

let s1 = {gene: "ACBD", function: "LOF"} and s2 = {a: 20, b: "hello"} in merge(s1, s2)
result: {gene: "ACBD", function: "LOF", a: 20, b: "hello"}

• orElse(a: T, b: T): T

If a is not missing, returns a. Otherwise, returns b.

Examples

Replace missing phenotype values with the mean value:

>>> [mean_height] = vds.query_samples(['samples.map(s => sa.pheno.height).stats()'])['mean']
>>> vds.annotate_samples_expr('sa.pheno.heightImputed = orElse(sa.pheno.height, %d)' % mean_height)

• orMissing(a: Boolean, b: T): T – If predicate evaluates to true, returns value. Otherwise, returns NA.

• pchisqtail(x: Double, df: Double): Double

Returns right-tail probability p for which p = Prob($$Z^2$$ > x) with $$Z^2$$ a chi-squared random variable with degrees of freedom specified by df. x must be positive.

Arguments

• x (Double) – Number at which to compute the probability.
• df (Double) – Degrees of freedom.
• pcoin(p: Double): Boolean

Returns true with probability p. This function is non-deterministic.

Arguments

• p (Double) – Probability. Should be between 0.0 and 1.0.
• pnorm(x: Double): Double

Returns left-tail probability p for which p = Prob($$Z$$ < x) with $$Z$$ a standard normal random variable.

Arguments

• x (Double) – Number at which to compute the probability.
• pow(b: Double, x: Double): Double

Returns b raised to the power of x.

Arguments

• b (Double) – the base.
• x (Double) – the exponent.
• ppois(x: Double, lambda: Double, lowerTail: Boolean, logP: Boolean): Double

If lowerTail equals true, returns Prob($$X \leq$$ x) where $$X$$ is a Poisson random variable with rate parameter lambda. If lowerTail equals false, returns Prob($$X$$ > x).

Arguments

• x (Double) – Non-negative number at which to compute the probability density.
• lambda (Double) – Poisson rate parameter. Must be non-negative.
• lowerTail (Boolean) – If false, returns the exclusive right-tail probability $$P(X > x)$$.
• logP (Boolean) – If true, probabilities are returned as log(p).
• ppois(x: Double, lambda: Double): Double

Returns the left-tail Prob($$X \leq$$ x) where $$X$$ is a Poisson random variable with rate parameter lambda.

Arguments

• x (Double) – Non-negative bound for the left-tail cumulative probability.
• lambda (Double) – Poisson rate parameter. Must be non-negative.
• qchisqtail(p: Double, df: Double): Double

Returns right-quantile x for which p = Prob($$Z^2$$ > x) with $$Z^2$$ a chi-squared random variable with degrees of freedom specified by df. p must satisfy 0 < p <= 1. Inverse of pchisq1tail.

Arguments

• p (Double) – Probability
• df (Double) – Degrees of freedom.
• qnorm(p: Double): Double

Returns left-quantile x for which p = Prob($$Z$$ < x) with $$Z$$ a standard normal random variable. p must satisfy 0 < p < 1. Inverse of pnorm.

Arguments

• p (Double) – Probability
• qpois(p: Double, lambda: Double, lowerTail: Boolean, logP: Boolean): Int

If lowerTail equals true, returns the smallest integer $$x$$ such that Prob($$X \leq x$$) $$\geq$$ p where $$X$$ is a Poisson random variable with rate parameter lambda. If lowerTail equals false, returns the largest integer $$x$$ such that Prob($$X > x$$) $$\geq$$ p. Inverts ppois.

Arguments

• p (Double) – Quantile to compute. Must satisfy $$0 \leq p \leq 1$$.
• lambda (Double) – Poisson rate parameter. Must be non-negative.
• lowerTail (Boolean) – If false, returns the right-tail inverse cumulative density function.
• logP (Boolean) – If true, input quantiles are given as log(p).
• qpois(p: Double, lambda: Double): Int

Returns the smallest integer $$x$$ such that Prob($$X \leq x$$) $$\geq$$ p where $$X$$ is a Poisson random variable with rate parameter lambda. Inverts ppois.

Arguments

• p (Double) – Quantile to compute. Must satisfy $$0 \leq p \leq 1$$.
• lambda (Double) – Poisson rate parameter. Must be non-negative.
• range(start: Int, stop: Int, step: Int): Array[Int]

Generate an Array with values in the interval [start, stop) in increments of step.

let r = range(0, 5, 2) in r
result: [0, 2, 4]


Arguments

• start (Int) – Starting number of the sequence.
• stop (Int) – Generate numbers up to, but not including this number.
• step (Int) – Difference between each number in the sequence.
• range(start: Int, stop: Int): Array[Int]

Generate an Array with values in the interval [start, stop).

let r = range(5, 8) in r
result: [5, 6, 7]


Arguments

• start (Int) – Starting number of the sequence.
• stop (Int) – Generate numbers up to, but not including this number.
• range(stop: Int): Array[Int]

Generate an Array with values in the interval [0, stop).

let r = range(3) in r
result: [0, 1, 2]


Arguments

• stop (Int) – Number of integers (whole numbers) to generate, starting from zero.
• rnorm(mean: Double, sd: Double): Double

Returns a random draw from a normal distribution with mean mean and standard deviation sd. sd should be non-negative. This function is non-deterministic.

Arguments

• mean (Double) – Mean value of normal distribution.
• sd (Double) – Standard deviation of normal distribution.
• rpois(n: Int, lambda: Double): Array[Double]

Returns n random draws from a Poisson distribution with rate parameter lambda. This function is non-deterministic.

Arguments

• n (Int) – Number of random draws to make.
• lambda (Double) – Poisson rate parameter. Must be non-negative.
• rpois(lambda: Double): Double

Returns a random draw from a Poisson distribution with rate parameter lambda. This function is non-deterministic.

Arguments

• lambda (Double) – Poisson rate parameter. Must be non-negative.
• runif(min: Double, max: Double): Double

Returns a random draw from a uniform distribution on [min, max). min should be less than or equal to max. This function is non-deterministic.

Arguments

• min (Double) – Minimum value of interval.
• max (Double) – Maximum value of interval, non-inclusive.
• select(s: Struct, identifiers: String*): Struct

Return a new Struct with a subset of fields.

let s = {gene: "ACBD", function: "LOF", nHet: 12} in select(s, gene, function)
result: {gene: "ACBD", function: "LOF"}


Arguments

• s (Struct) – Struct to select fields from.
• identifiers (String*) – Field names to select from s. Multiple arguments allowed.
• sqrt(x: Double): Double

Returns the square root of the given value x.

Arguments

• x (Double) – the number to take the square root of.
• str(x: T): String

Returns the string representation of a data type.

let v = Variant("1", 278653, "A", "T") in str(v)
result: "1:278653:A:T"

• Variant(contig: String, pos: Int, ref: String, alts: Array[String]): Variant

Construct a Variant object.

let v = Variant("1", 25782743, "A", Array("T", "TA")) in v.ref
result: "A"


Arguments

• contig (String) – String representation of contig.
• pos (Int) – SNP position or start of an indel.
• ref (String) – Reference allele sequence.
• alts (Array[String]) – Array of alternate allele sequences.
• Variant(contig: String, pos: Int, ref: String, alt: String): Variant

Construct a Variant object.

let v = Variant("2", 13427, "T", "G") in v.ref
result: "T"


Arguments

• contig (String) – String representation of contig.
• pos (Int) – SNP position or start of an indel.
• ref (String) – Reference allele sequence.
• alt (String) – Alternate allele sequence.
• Variant(s: String): Variant

Construct a Variant object.

let v = Variant("7:76324539:A:G") in v.contig
result: "7"


Arguments

• s (String) – String of the form CHR:POS:REF:ALT or CHR:POS:REF:ALT1,ALT2...ALTN specifying the contig, position, reference and alternate alleles.