Functions¶
binomTest(x: Int, n: Int, p: Double, alternative: String): Double
Returns the p-value from the exact binomial test of the null hypothesis that success has probability p, given x successes in n trials.
Examples
Test each variant for allele balance across all heterozygous genotypes, under the null hypothesis that the two alleles are sampled with equal probability.
>>> (vds.split_multi() ... .annotate_variants_expr( ... 'va.ab_binom_test = let all_samples_ad = gs.filter(g => g.isHet).map(g => g.ad).sum() in ' ... 'binomTest(all_samples_ad[1], all_samples_ad.sum(), 0.5, "two.sided")'))Arguments
- x (Int) – Number of successes
- n (Int) – Number of trials
- p (Double) – Probability of success under the null hypothesis
- alternative (String) – Alternative hypothesis, must be “two.sided”, “greater” or “less”.
chisq(c1: Int, c2: Int, c3: Int, c4: Int): Struct{pValue:Double,oddsRatio:Double}
- pValue (Double) – p-value
- oddsRatio (Double) – odds ratio
Calculates p-value (Chi-square approximation) and odds ratio for 2x2 table
Arguments
- c1 (Int) – value for cell 1
- c2 (Int) – value for cell 2
- c3 (Int) – value for cell 3
- c4 (Int) – value for cell 4
combineVariants(left: Variant, right: Variant): Struct{variant:Variant,laIndices:Dict[Int,Int],raIndices:Dict[Int,Int]}
- variant (Variant) – Resulting combined variant.
- laIndices (Dict[Int, Int]) – Mapping from new to old allele index for the left variant.
- raIndices (Dict[Int, Int]) – Mapping from new to old allele index for the right variant.
Combines the alleles of two variants at the same locus, making sure that ref and alt alleles are represented uniformely. In addition to the resulting variant containing all alleles, this function also returns the mapping from the old to the new allele indices. Note that this mapping counts the reference allele always contains the reference allele mapping 0 -> 0.
let left = Variant("1:1000:AT:A,CT") and right = Variant("1:1000:A:C,AGG") in combineVariants(left,right) result: Struct{'variant': Variant(contig=1, start=1000, ref=AT, alts=[AltAllele(ref=AT, alt=A), AltAllele(ref=AT, alt=CT), AltAllele(ref=AT, alt=AGGT)]), 'laIndices': {0: 0, 1: 1, 2: 2}, 'raIndices': {0:0, 2: 1, 3: 2}}
Arguments
- left (Variant) – Left variant to combine.
- right (Variant) – Right variant to combine.
ctt(c1: Int, c2: Int, c3: Int, c4: Int, minCellCount: Int): Struct{pValue:Double,oddsRatio:Double}
- pValue (Double) – p-value
- oddsRatio (Double) – odds ratio
Calculates p-value and odds ratio for 2x2 table. If any cell is lower than minCellCount Fishers exact test is used, otherwise faster chi-squared approximation is used.
Arguments
- c1 (Int) – value for cell 1
- c2 (Int) – value for cell 2
- c3 (Int) – value for cell 3
- c4 (Int) – value for cell 4
- minCellCount (Int) – Minimum cell count for using chi-squared approximation
Dict(keys: Array[T], values: Array[U]): Dict[T, U]
Construct a Dict from an array of keys and an array of values.
Arguments
- keys (Array[T]) – Keys of Dict.
- values (Array[U]) – Values of Dict.
dpois(x: Double, lambda: Double, logP: Boolean): Double
Returns Prob(\(X\) =
x
) from a Poisson distribution with rate parameterlambda
.Arguments
- x (Double) – Non-negative number at which to compute the probability density.
- lambda (Double) – Poisson rate parameter. Must be non-negative.
- logP (Boolean) – If true, probabilities are returned as log(p).
dpois(x: Double, lambda: Double): Double
Returns Prob(\(X\) =
x
) from a Poisson distribution with rate parameterlambda
.Arguments
- x (Double) – Non-negative number at which to compute the probability density.
- lambda (Double) – Poisson rate parameter. Must be non-negative.
drop(s: Struct, identifiers: String*): Struct
Return a new
Struct
with the a subset of fields not matchingidentifiers
.let s = {gene: "ACBD", function: "LOF", nHet: 12} in drop(s, gene, function) result: {nHet: 12}
Arguments
- s (Struct) – Struct to drop fields from.
- identifiers (String*) – Field names to drop from
s
. Multiple arguments allowed.exp(x: Double): Double
Returns Euler’s number
e
raised to the power of the given valuex
.Arguments
- x (Double) – the exponent to raise
e
to.fet(a: Int, b: Int, c: Int, d: Int): Struct{pValue:Double,oddsRatio:Double,ci95Lower:Double,ci95Upper:Double}
- pValue (Double) – p-value
- oddsRatio (Double) – odds ratio
- ci95Lower (Double) – lower bound for 95% confidence interval
- ci95Upper (Double) – upper bound for 95% confidence interval
Calculates the p-value, odds ratio, and 95% confidence interval with Fisher’s exact test (FET) for 2x2 tables.
Examples
Annotate each variant with Fisher’s exact test association results (assumes minor/major allele count variant annotations have been computed):
>>> (vds.annotate_variants_expr( ... 'va.fet = let macCase = gs.filter(g => sa.pheno.isCase).map(g => g.nNonRefAlleles()).sum() and ' ... 'macControl = gs.filter(g => !sa.pheno.isCase).map(g => g.nNonRefAlleles()).sum() and ' ... 'majCase = gs.filter(g => sa.pheno.isCase).map(g => 2 - g.nNonRefAlleles()).sum() and ' ... 'majControl = gs.filter(g => !sa.pheno.isCase).map(g => 2 - g.nNonRefAlleles()).sum() in ' ... 'fet(macCase, macControl, majCase, majControl)'))Notes
fet
is identical to the version implemented in R with default parameters (two-sided, alpha = 0.05, null hypothesis that the odds ratio equals 1).Arguments
- a (Int) – value for cell 1
- b (Int) – value for cell 2
- c (Int) – value for cell 3
- d (Int) – value for cell 4
filtering_allele_frequency(ac: Int, an: Int, ci: Double): Double
Computes a filtering allele frequency (described below) for ac and an with confidence ci.
The filtering allele frequency is the highest true population allele frequency for which the upper bound of the ci (confidence interval) of allele count under a Poisson distribution is still less than the variant’s observed ac (allele count) in the reference sample, given an an (allele number).
This function defines a “filtering AF” that represents the threshold disease-specific “maximum credible AF” at or below which the disease could not plausibly be caused by that variant. A variant with a filtering AF >= the maximum credible AF for the disease under consideration should be filtered, while a variant with a filtering AF below the maximum credible remains a candidate. This filtering AF is not disease-specific: it can be applied to any disease of interest by comparing with a user-defined disease-specific maximum credible AF.
For more details, see: Whiffin et al., 2017
Arguments
- ac (Int) – Allele count
- an (Int) – Allele number
- ci (Double) – Confidence interval. Should be between 0.0 and 1.0.
gamma(x: Double): Double
Genotype(v: Variant, c: Int, ad: Array[Int], dp: Int, gq: Int, pl: Array[Int]): Genotype
Construct a Genotype object by specifying the variant, call, allelic depths, depth, genotype quality, and phred-scaled likelihoods.
let v = Variant("7:76324539:A:G") and call = Call(0) and ad = [10, 0] and dp = 10 and gq = 20 and pl = [0, 10, 100] and g = Genotype(v, call, ad, dp, gq, pl) in g.isHomRef() result: true
Arguments
- v (Variant) – Variant
- c (Int) – Call
- ad (Array[Int]) – Allelic depths
- dp (Int) – Depth
- gq (Int) – Genotype quality
- pl (Array[Int]) – Phred-scaled likelihoods
Genotype(v: Variant, gt: Int, prob: Array[Double]): Genotype
Construct a Genotype object from a variant, a genotype call, and an array of genotype probabilities.
let v = Variant("7:76324539:A:G") and gt = 0 and prob = [0.8, 0.1, 0.1] and g = Genotype(v, gt, prob) in g.isHomRef() result: true
Arguments
- v (Variant) – Variant
- gt (Int) – Genotype call integer
- prob (Array[Double]) – Genotype probabilities
Genotype(v: Variant, prob: Array[Double]): Genotype
Construct a Genotype object from a variant and an array of genotype probabilities.
let v = Variant("7:76324539:A:G") and prob = [0.2, 0.7, 0.1] and g = Genotype(v, prob) in g.isHet() result: true
Arguments
- v (Variant) – Variant
- prob (Array[Double]) – Genotype probabilities
gtIndex(j: Int, k: Int): Int
Convert from
j/k
pair to genotype index (triangular numbers).Arguments
- j (Int) – j in
j/k
pairs.- k (Int) – k in
j/k
pairs.gtj(i: Int): Int
Convert from genotype index (triangular numbers) to
j/k
pairs. Returnsj
.Arguments
- i (Int) – Genotype index.
gtk(k: Int): Int
Convert from genotype index (triangular numbers) to
j/k
pairs. Returnsk
.Arguments
- k (Int) – Genotype index.
hwe(nHomRef: Int, nHet: Int, nHomVar: Int): Struct{rExpectedHetFrequency:Double,pHWE:Double}
- rExpectedHetFrequency (Double) – Expected rHeterozygosity based on Hardy Weinberg Equilibrium
- pHWE (Double) – P-value
Compute Hardy Weinberg Equilbrium (HWE) p-value.
Examples
Compute HWE p-value per variant:
>>> (vds.annotate_variants_expr('va.hwe = ' ... 'let nHomRef = gs.filter(g => g.isHomRef()).count().toInt() and ' ... 'nHet = gs.filter(g => g.isHet()).count().toInt() and ' ... 'nHomVar = gs.filter(g => g.isHomVar()).count().toInt() in ' ... 'hwe(nHomRef, nHet, nHomVar)'))Notes
Hail computes the exact p-value with mid-p-value correction, i.e. the probability of a less-likely outcome plus one-half the probability of an equally-likely outcome. See this document for details on the Levene-Haldane distribution and references.
Arguments
- nHomRef (Int) – Number of samples that are homozygous for the reference allele.
- nHet (Int) – Number of samples that are heterozygotes.
- nHomVar (Int) – Number of samples that are homozygous for the alternate allele.
index(structs: Array[Struct], identifier: String): Dict[String, Struct]
Returns a Dict keyed by the string field
identifier
of eachStruct
in theArray
and values that are structs with the remaining fields.let a = [{PLI: 0.998, genename: "gene1", hits_in_exac: 1}, {PLI: 0.0015, genename: "gene2", hits_in_exac: 10}, {PLI: 0.9045, genename: "gene3", hits_in_exac: 2}] and d = index(a, genename) in global.gene_dict["gene1"] result: {PLI: 0.998, hits_in_exac: 1}
Interval(chr: String, start: Int, end: Int): Interval
Constructs an interval from a given chromosome, start, and end.
Arguments
- chr (String) – Chromosome.
- start (Int) – Starting position.
- end (Int) – Ending position (exclusive).
Interval(s: String): Interval
Interval(startLocus: Locus, endLocus: Locus): Interval
Construct a Interval object. Intervals are left inclusive, right exclusive. This means that
[chr1:1, chr1:3)
containschr1:1
andchr1:2
.Arguments
- startLocus (Locus) – Start position of interval
- endLocus (Locus) – End position of interval
isDefined(a: T): Boolean – Returns true if item is non-missing. Otherwise, false.
isMissing(a: T): Boolean – Returns true if item is missing. Otherwise, false.
isnan(a: Double): Boolean – Returns true if the argument is NaN (not a number), false if the argument is defined but not NaN. Returns missing if the argument is missing.
json(x: T): String – Returns the JSON representation of a data type.
Locus(contig: String, pos: Int): Locus
Construct a Locus object.
let l = Locus("1", 10040532) in l.position result: 10040532
Arguments
- contig (String) – String representation of contig.
- pos (Int) – SNP position or start of an indel.
Locus(s: String): Locus
Construct a Locus object.
let l = Locus("1:10040532") in l.position result: 10040532
Arguments
- s (String) – String of the form
CHR:POS
log(x: Double, b: Double): Double
Returns the base
b
logarithm of the given valuex
.Arguments
- x (Double) – the number to take the base
b
logarithm of.- b (Double) – the base.
log(x: Double): Double
Returns the natural logarithm of the given value
x
.Arguments
- x (Double) – the number to take the natural logarithm of.
log10(x: Double): Double
Returns the base 10 logarithm of the given value
x
.Arguments
- x (Double) – the number to take the base 10 logarithm of.
merge(s1: Struct, s2: Struct): Struct
Create a new
Struct
with all fields ins1
ands2
.let s1 = {gene: "ACBD", function: "LOF"} and s2 = {a: 20, b: "hello"} in merge(s1, s2) result: {gene: "ACBD", function: "LOF", a: 20, b: "hello"}
orElse(a: T, b: T): T
If
a
is not missing, returnsa
. Otherwise, returnsb
.Examples
Replace missing phenotype values with the mean value:
>>> [mean_height] = vds.query_samples(['samples.map(s => sa.pheno.height).stats()'])['mean'] >>> vds.annotate_samples_expr('sa.pheno.heightImputed = orElse(sa.pheno.height, %d)' % mean_height)orMissing(a: Boolean, b: T): T – If
predicate
evaluates to true, returnsvalue
. Otherwise, returns NA.pchisqtail(x: Double, df: Double): Double
Returns right-tail probability p for which p = Prob(\(Z^2\) > x) with \(Z^2\) a chi-squared random variable with degrees of freedom specified by
df
.x
must be positive.Arguments
- x (Double) – Number at which to compute the probability.
- df (Double) – Degrees of freedom.
pcoin(p: Double): Boolean
Returns true with probability
p
. This function is non-deterministic.Arguments
- p (Double) – Probability. Should be between 0.0 and 1.0.
pnorm(x: Double): Double
Returns left-tail probability p for which p = Prob(\(Z\) <
x
) with \(Z\) a standard normal random variable.Arguments
- x (Double) – Number at which to compute the probability.
pow(b: Double, x: Double): Double
Returns
b
raised to the power ofx
.Arguments
- b (Double) – the base.
- x (Double) – the exponent.
ppois(x: Double, lambda: Double, lowerTail: Boolean, logP: Boolean): Double
If
lowerTail
equals true, returns Prob(\(X \leq\)x
) where \(X\) is a Poisson random variable with rate parameterlambda
. IflowerTail
equals false, returns Prob(\(X\) >x
).Arguments
- x (Double) – Non-negative number at which to compute the probability density.
- lambda (Double) – Poisson rate parameter. Must be non-negative.
- lowerTail (Boolean) – If false, returns the exclusive right-tail probability \(P(X > x)\).
- logP (Boolean) – If true, probabilities are returned as log(p).
ppois(x: Double, lambda: Double): Double
Returns the left-tail Prob(\(X \leq\)
x
) where \(X\) is a Poisson random variable with rate parameterlambda
.Arguments
- x (Double) – Non-negative bound for the left-tail cumulative probability.
- lambda (Double) – Poisson rate parameter. Must be non-negative.
qchisqtail(p: Double, df: Double): Double
Returns right-quantile x for which p = Prob(\(Z^2\) > x) with \(Z^2\) a chi-squared random variable with degrees of freedom specified by
df
.p
must satisfy 0 < p <= 1. Inverse of pchisq1tail.Arguments
- p (Double) – Probability
- df (Double) – Degrees of freedom.
qnorm(p: Double): Double
Returns left-quantile x for which p = Prob(\(Z\) < x) with \(Z\) a standard normal random variable.
p
must satisfy 0 <p
< 1. Inverse of pnorm.Arguments
- p (Double) – Probability
qpois(p: Double, lambda: Double, lowerTail: Boolean, logP: Boolean): Int
If
lowerTail
equals true, returns the smallest integer \(x\) such that Prob(\(X \leq x\)) \(\geq\)p
where \(X\) is a Poisson random variable with rate parameterlambda
. IflowerTail
equals false, returns the largest integer \(x\) such that Prob(\(X > x\)) \(\geq\)p
. Inverts ppois.Arguments
- p (Double) – Quantile to compute. Must satisfy \(0 \leq p \leq 1\).
- lambda (Double) – Poisson rate parameter. Must be non-negative.
- lowerTail (Boolean) – If false, returns the right-tail inverse cumulative density function.
- logP (Boolean) – If true, input quantiles are given as log(p).
qpois(p: Double, lambda: Double): Int
Returns the smallest integer \(x\) such that Prob(\(X \leq x\)) \(\geq\)
p
where \(X\) is a Poisson random variable with rate parameterlambda
. Inverts ppois.Arguments
- p (Double) – Quantile to compute. Must satisfy \(0 \leq p \leq 1\).
- lambda (Double) – Poisson rate parameter. Must be non-negative.
range(start: Int, stop: Int, step: Int): Array[Int]
Generate an
Array
with values in the interval[start, stop)
in increments of step.let r = range(0, 5, 2) in r result: [0, 2, 4]
Arguments
- start (Int) – Starting number of the sequence.
- stop (Int) – Generate numbers up to, but not including this number.
- step (Int) – Difference between each number in the sequence.
range(start: Int, stop: Int): Array[Int]
Generate an
Array
with values in the interval[start, stop)
.let r = range(5, 8) in r result: [5, 6, 7]
Arguments
- start (Int) – Starting number of the sequence.
- stop (Int) – Generate numbers up to, but not including this number.
range(stop: Int): Array[Int]
Generate an
Array
with values in the interval[0, stop)
.let r = range(3) in r result: [0, 1, 2]
Arguments
- stop (Int) – Number of integers (whole numbers) to generate, starting from zero.
rnorm(mean: Double, sd: Double): Double
Returns a random draw from a normal distribution with mean
mean
and standard deviationsd
.sd
should be non-negative. This function is non-deterministic.Arguments
- mean (Double) – Mean value of normal distribution.
- sd (Double) – Standard deviation of normal distribution.
rpois(n: Int, lambda: Double): Array[Double]
Returns
n
random draws from a Poisson distribution with rate parameterlambda
. This function is non-deterministic.Arguments
- n (Int) – Number of random draws to make.
- lambda (Double) – Poisson rate parameter. Must be non-negative.
rpois(lambda: Double): Double
Returns a random draw from a Poisson distribution with rate parameter
lambda
. This function is non-deterministic.Arguments
- lambda (Double) – Poisson rate parameter. Must be non-negative.
runif(min: Double, max: Double): Double
Returns a random draw from a uniform distribution on [
min
,max
).min
should be less than or equal tomax
. This function is non-deterministic.Arguments
- min (Double) – Minimum value of interval.
- max (Double) – Maximum value of interval, non-inclusive.
select(s: Struct, identifiers: String*): Struct
Return a new
Struct
with a subset of fields.let s = {gene: "ACBD", function: "LOF", nHet: 12} in select(s, gene, function) result: {gene: "ACBD", function: "LOF"}
Arguments
- s (Struct) – Struct to select fields from.
- identifiers (String*) – Field names to select from
s
. Multiple arguments allowed.sqrt(x: Double): Double
Returns the square root of the given value
x
.Arguments
- x (Double) – the number to take the square root of.
str(x: T): String
Returns the string representation of a data type.
let v = Variant("1", 278653, "A", "T") in str(v) result: "1:278653:A:T"
Variant(contig: String, pos: Int, ref: String, alts: Array[String]): Variant
Construct a Variant object.
let v = Variant("1", 25782743, "A", Array("T", "TA")) in v.ref result: "A"
Arguments
- contig (String) – String representation of contig.
- pos (Int) – SNP position or start of an indel.
- ref (String) – Reference allele sequence.
- alts (Array[String]) – Array of alternate allele sequences.
Variant(contig: String, pos: Int, ref: String, alt: String): Variant
Construct a Variant object.
let v = Variant("2", 13427, "T", "G") in v.ref result: "T"
Arguments
- contig (String) – String representation of contig.
- pos (Int) – SNP position or start of an indel.
- ref (String) – Reference allele sequence.
- alt (String) – Alternate allele sequence.
Variant(s: String): Variant
Construct a Variant object.
let v = Variant("7:76324539:A:G") in v.contig result: "7"
Arguments
- s (String) – String of the form
CHR:POS:REF:ALT
orCHR:POS:REF:ALT1,ALT2...ALTN
specifying the contig, position, reference and alternate alleles.