.. sec-functions: .. testsetup:: vds = hc.read("data/example.vds").annotate_variants_expr('va.genes = ["ACBD", "DCBA"]') ========= Functions ========= - **binomTest(x: Int, n: Int, p: Double, alternative: String)**: *Double* Returns the p-value from the `exact binomial test `__ of the null hypothesis that success has probability `p`, given `x` successes in `n` trials. **Examples** Test each variant for allele balance across all heterozygous genotypes, under the null hypothesis that the two alleles are sampled with equal probability. >>> (vds.split_multi() ... .annotate_variants_expr( ... 'va.ab_binom_test = let all_samples_ad = gs.filter(g => g.isHet).map(g => g.ad).sum() in ' ... 'binomTest(all_samples_ad[1], all_samples_ad.sum(), 0.5, "two.sided")')) **Arguments** - **x** (*Int*) -- Number of successes - **n** (*Int*) -- Number of trials - **p** (*Double*) -- Probability of success under the null hypothesis - **alternative** (*String*) -- Alternative hypothesis, must be "two.sided", "greater" or "less". - **chisq(c1: Int, c2: Int, c3: Int, c4: Int)**: *Struct{pValue:Double,oddsRatio:Double}* .. container:: annotation - **pValue** (*Double*) -- p-value - **oddsRatio** (*Double*) -- odds ratio Calculates p-value (Chi-square approximation) and odds ratio for 2x2 table **Arguments** - **c1** (*Int*) -- value for cell 1 - **c2** (*Int*) -- value for cell 2 - **c3** (*Int*) -- value for cell 3 - **c4** (*Int*) -- value for cell 4 - **combineVariants(left: Variant, right: Variant)**: *Struct{variant:Variant,laIndices:Dict[Int,Int],raIndices:Dict[Int,Int]}* .. container:: annotation - **variant** (*Variant*) -- Resulting combined variant. - **laIndices** (*Dict[Int, Int]*) -- Mapping from new to old allele index for the left variant. - **raIndices** (*Dict[Int, Int]*) -- Mapping from new to old allele index for the right variant. Combines the alleles of two variants at the same locus, making sure that ref and alt alleles are represented uniformely. In addition to the resulting variant containing all alleles, this function also returns the mapping from the old to the new allele indices. Note that this mapping counts the reference allele always contains the reference allele mapping 0 -> 0. .. code-block:: text :emphasize-lines: 2 let left = Variant("1:1000:AT:A,CT") and right = Variant("1:1000:A:C,AGG") in combineVariants(left,right) result: Struct{'variant': Variant(contig=1, start=1000, ref=AT, alts=[AltAllele(ref=AT, alt=A), AltAllele(ref=AT, alt=CT), AltAllele(ref=AT, alt=AGGT)]), 'laIndices': {0: 0, 1: 1, 2: 2}, 'raIndices': {0:0, 2: 1, 3: 2}} **Arguments** - **left** (*Variant*) -- Left variant to combine. - **right** (*Variant*) -- Right variant to combine. - **ctt(c1: Int, c2: Int, c3: Int, c4: Int, minCellCount: Int)**: *Struct{pValue:Double,oddsRatio:Double}* .. container:: annotation - **pValue** (*Double*) -- p-value - **oddsRatio** (*Double*) -- odds ratio Calculates p-value and odds ratio for 2x2 table. If any cell is lower than `minCellCount` Fishers exact test is used, otherwise faster chi-squared approximation is used. **Arguments** - **c1** (*Int*) -- value for cell 1 - **c2** (*Int*) -- value for cell 2 - **c3** (*Int*) -- value for cell 3 - **c4** (*Int*) -- value for cell 4 - **minCellCount** (*Int*) -- Minimum cell count for using chi-squared approximation - **Dict(keys: Array[T], values: Array[U])**: *Dict[T, U]* Construct a Dict from an array of keys and an array of values. **Arguments** - **keys** (*Array[T]*) -- Keys of Dict. - **values** (*Array[U]*) -- Values of Dict. - **dpois(x: Double, lambda: Double, logP: Boolean)**: *Double* Returns Prob(:math:`X` = ``x``) from a Poisson distribution with rate parameter ``lambda``. **Arguments** - **x** (*Double*) -- Non-negative number at which to compute the probability density. - **lambda** (*Double*) -- Poisson rate parameter. Must be non-negative. - **logP** (*Boolean*) -- If true, probabilities are returned as log(p). - **dpois(x: Double, lambda: Double)**: *Double* Returns Prob(:math:`X` = ``x``) from a Poisson distribution with rate parameter ``lambda``. **Arguments** - **x** (*Double*) -- Non-negative number at which to compute the probability density. - **lambda** (*Double*) -- Poisson rate parameter. Must be non-negative. - **drop(s: Struct, identifiers: String*)**: *Struct* Return a new ``Struct`` with the a subset of fields not matching ``identifiers``. .. code-block:: text :emphasize-lines: 2 let s = {gene: "ACBD", function: "LOF", nHet: 12} in drop(s, gene, function) result: {nHet: 12} **Arguments** - **s** (*Struct*) -- Struct to drop fields from. - **identifiers** (*String\**) -- Field names to drop from ``s``. Multiple arguments allowed. - **exp(x: Double)**: *Double* Returns Euler's number ``e`` raised to the power of the given value ``x``. **Arguments** - **x** (*Double*) -- the exponent to raise ``e`` to. - **fet(a: Int, b: Int, c: Int, d: Int)**: *Struct{pValue:Double,oddsRatio:Double,ci95Lower:Double,ci95Upper:Double}* .. container:: annotation - **pValue** (*Double*) -- p-value - **oddsRatio** (*Double*) -- odds ratio - **ci95Lower** (*Double*) -- lower bound for 95% confidence interval - **ci95Upper** (*Double*) -- upper bound for 95% confidence interval Calculates the p-value, odds ratio, and 95% confidence interval with Fisher's exact test (FET) for 2x2 tables. **Examples** Annotate each variant with Fisher's exact test association results (assumes minor/major allele count variant annotations have been computed): >>> (vds.annotate_variants_expr( ... 'va.fet = let macCase = gs.filter(g => sa.pheno.isCase).map(g => g.nNonRefAlleles()).sum() and ' ... 'macControl = gs.filter(g => !sa.pheno.isCase).map(g => g.nNonRefAlleles()).sum() and ' ... 'majCase = gs.filter(g => sa.pheno.isCase).map(g => 2 - g.nNonRefAlleles()).sum() and ' ... 'majControl = gs.filter(g => !sa.pheno.isCase).map(g => 2 - g.nNonRefAlleles()).sum() in ' ... 'fet(macCase, macControl, majCase, majControl)')) **Notes** ``fet`` is identical to the version implemented in `R `_ with default parameters (two-sided, alpha = 0.05, null hypothesis that the odds ratio equals 1). **Arguments** - **a** (*Int*) -- value for cell 1 - **b** (*Int*) -- value for cell 2 - **c** (*Int*) -- value for cell 3 - **d** (*Int*) -- value for cell 4 - **filtering_allele_frequency(ac: Int, an: Int, ci: Double)**: *Double* Computes a filtering allele frequency (described below) for `ac` and `an` with confidence `ci`. The filtering allele frequency is the highest true population allele frequency for which the upper bound of the `ci` (confidence interval) of allele count under a Poisson distribution is still less than the variant’s observed `ac` (allele count) in the reference sample, given an `an` (allele number). This function defines a "filtering AF" that represents the threshold disease-specific "maximum credible AF" at or below which the disease could not plausibly be caused by that variant. A variant with a filtering AF >= the maximum credible AF for the disease under consideration should be filtered, while a variant with a filtering AF below the maximum credible remains a candidate. This filtering AF is not disease-specific: it can be applied to any disease of interest by comparing with a user-defined disease-specific maximum credible AF. For more details, see: `Whiffin et al., 2017 `__ **Arguments** - **ac** (*Int*) -- Allele count - **an** (*Int*) -- Allele number - **ci** (*Double*) -- Confidence interval. Should be between 0.0 and 1.0. - **gamma(x: Double)**: *Double* Returns the value of the `gamma function `__ at ``x``. **Arguments** - **x** (*Double*) -- the input to gamma. - **Genotype(v: Variant, c: Int, ad: Array[Int], dp: Int, gq: Int, pl: Array[Int])**: *Genotype* Construct a :ref:`genotype` object by specifying the variant, call, allelic depths, depth, genotype quality, and phred-scaled likelihoods. .. code-block:: text :emphasize-lines: 4 let v = Variant("7:76324539:A:G") and call = Call(0) and ad = [10, 0] and dp = 10 and gq = 20 and pl = [0, 10, 100] and g = Genotype(v, call, ad, dp, gq, pl) in g.isHomRef() result: true **Arguments** - **v** (*Variant*) -- Variant - **c** (*Int*) -- Call - **ad** (*Array[Int]*) -- Allelic depths - **dp** (*Int*) -- Depth - **gq** (*Int*) -- Genotype quality - **pl** (*Array[Int]*) -- Phred-scaled likelihoods - **Genotype(v: Variant, gt: Int, prob: Array[Double])**: *Genotype* Construct a :ref:`genotype` object from a variant, a genotype call, and an array of genotype probabilities. .. code-block:: text :emphasize-lines: 3 let v = Variant("7:76324539:A:G") and gt = 0 and prob = [0.8, 0.1, 0.1] and g = Genotype(v, gt, prob) in g.isHomRef() result: true **Arguments** - **v** (*Variant*) -- Variant - **gt** (*Int*) -- Genotype call integer - **prob** (*Array[Double]*) -- Genotype probabilities - **Genotype(v: Variant, prob: Array[Double])**: *Genotype* Construct a :ref:`genotype` object from a variant and an array of genotype probabilities. .. code-block:: text :emphasize-lines: 3 let v = Variant("7:76324539:A:G") and prob = [0.2, 0.7, 0.1] and g = Genotype(v, prob) in g.isHet() result: true **Arguments** - **v** (*Variant*) -- Variant - **prob** (*Array[Double]*) -- Genotype probabilities - **gtIndex(j: Int, k: Int)**: *Int* Convert from ``j/k`` pair to genotype index (triangular numbers). **Arguments** - **j** (*Int*) -- j in ``j/k`` pairs. - **k** (*Int*) -- k in ``j/k`` pairs. - **gtj(i: Int)**: *Int* Convert from genotype index (triangular numbers) to ``j/k`` pairs. Returns ``j``. **Arguments** - **i** (*Int*) -- Genotype index. - **gtk(k: Int)**: *Int* Convert from genotype index (triangular numbers) to ``j/k`` pairs. Returns ``k``. **Arguments** - **k** (*Int*) -- Genotype index. - **hwe(nHomRef: Int, nHet: Int, nHomVar: Int)**: *Struct{rExpectedHetFrequency:Double,pHWE:Double}* .. container:: annotation - **rExpectedHetFrequency** (*Double*) -- Expected rHeterozygosity based on Hardy Weinberg Equilibrium - **pHWE** (*Double*) -- P-value Compute Hardy Weinberg Equilbrium (HWE) p-value. **Examples** Compute HWE p-value per variant: >>> (vds.annotate_variants_expr('va.hwe = ' ... 'let nHomRef = gs.filter(g => g.isHomRef()).count().toInt() and ' ... 'nHet = gs.filter(g => g.isHet()).count().toInt() and ' ... 'nHomVar = gs.filter(g => g.isHomVar()).count().toInt() in ' ... 'hwe(nHomRef, nHet, nHomVar)')) **Notes** Hail computes the exact p-value with mid-p-value correction, i.e. the probability of a less-likely outcome plus one-half the probability of an equally-likely outcome. See this `document `__ for details on the Levene-Haldane distribution and references. **Arguments** - **nHomRef** (*Int*) -- Number of samples that are homozygous for the reference allele. - **nHet** (*Int*) -- Number of samples that are heterozygotes. - **nHomVar** (*Int*) -- Number of samples that are homozygous for the alternate allele. - **index(structs: Array[Struct], identifier: String)**: *Dict[String, Struct]* Returns a Dict keyed by the string field ``identifier`` of each ``Struct`` in the ``Array`` and values that are structs with the remaining fields. .. code-block:: text :emphasize-lines: 6 let a = [{PLI: 0.998, genename: "gene1", hits_in_exac: 1}, {PLI: 0.0015, genename: "gene2", hits_in_exac: 10}, {PLI: 0.9045, genename: "gene3", hits_in_exac: 2}] and d = index(a, genename) in global.gene_dict["gene1"] result: {PLI: 0.998, hits_in_exac: 1} - **Interval(chr: String, start: Int, end: Int)**: *Interval* Constructs an interval from a given chromosome, start, and end. **Arguments** - **chr** (*String*) -- Chromosome. - **start** (*Int*) -- Starting position. - **end** (*Int*) -- Ending position (exclusive). - **Interval(s: String)**: *Interval* Returns an interval parsed in the same way as :py:meth:`~hail.representation.Interval.parse` **Arguments** - **s** (*String*) -- The string to parse. - **Interval(startLocus: Locus, endLocus: Locus)**: *Interval* Construct a :ref:`interval` object. Intervals are **left inclusive, right exclusive**. This means that ``[chr1:1, chr1:3)`` contains ``chr1:1`` and ``chr1:2``. **Arguments** - **startLocus** (*Locus*) -- Start position of interval - **endLocus** (*Locus*) -- End position of interval - **isDefined(a: T)**: *Boolean* -- Returns true if item is non-missing. Otherwise, false. - **isMissing(a: T)**: *Boolean* -- Returns true if item is missing. Otherwise, false. - **isnan(a: Double)**: *Boolean* -- Returns true if the argument is NaN (not a number), false if the argument is defined but not NaN. Returns missing if the argument is missing. - **json(x: T)**: *String* -- Returns the JSON representation of a data type. - **Locus(contig: String, pos: Int)**: *Locus* Construct a :ref:`locus` object. .. code-block:: text :emphasize-lines: 2 let l = Locus("1", 10040532) in l.position result: 10040532 **Arguments** - **contig** (*String*) -- String representation of contig. - **pos** (*Int*) -- SNP position or start of an indel. - **Locus(s: String)**: *Locus* Construct a :ref:`locus` object. .. code-block:: text :emphasize-lines: 2 let l = Locus("1:10040532") in l.position result: 10040532 **Arguments** - **s** (*String*) -- String of the form ``CHR:POS`` - **log(x: Double, b: Double)**: *Double* Returns the base ``b`` logarithm of the given value ``x``. **Arguments** - **x** (*Double*) -- the number to take the base ``b`` logarithm of. - **b** (*Double*) -- the base. - **log(x: Double)**: *Double* Returns the natural logarithm of the given value ``x``. **Arguments** - **x** (*Double*) -- the number to take the natural logarithm of. - **log10(x: Double)**: *Double* Returns the base 10 logarithm of the given value ``x``. **Arguments** - **x** (*Double*) -- the number to take the base 10 logarithm of. - **merge(s1: Struct, s2: Struct)**: *Struct* Create a new ``Struct`` with all fields in ``s1`` and ``s2``. .. code-block:: text :emphasize-lines: 2 let s1 = {gene: "ACBD", function: "LOF"} and s2 = {a: 20, b: "hello"} in merge(s1, s2) result: {gene: "ACBD", function: "LOF", a: 20, b: "hello"} - **orElse(a: T, b: T)**: *T* If ``a`` is not missing, returns ``a``. Otherwise, returns ``b``. **Examples** Replace missing phenotype values with the mean value: :: >>> [mean_height] = vds.query_samples(['samples.map(s => sa.pheno.height).stats()'])['mean'] >>> vds.annotate_samples_expr('sa.pheno.heightImputed = orElse(sa.pheno.height, %d)' % mean_height) - **orMissing(a: Boolean, b: T)**: *T* -- If ``predicate`` evaluates to true, returns ``value``. Otherwise, returns NA. - **pchisqtail(x: Double, df: Double)**: *Double* Returns right-tail probability p for which p = Prob(:math:`Z^2` > x) with :math:`Z^2` a chi-squared random variable with degrees of freedom specified by ``df``. ``x`` must be positive. **Arguments** - **x** (*Double*) -- Number at which to compute the probability. - **df** (*Double*) -- Degrees of freedom. - **pcoin(p: Double)**: *Boolean* Returns true with probability ``p``. This function is non-deterministic. **Arguments** - **p** (*Double*) -- Probability. Should be between 0.0 and 1.0. - **pnorm(x: Double)**: *Double* Returns left-tail probability p for which p = Prob(:math:`Z` < ``x``) with :math:`Z` a standard normal random variable. **Arguments** - **x** (*Double*) -- Number at which to compute the probability. - **pow(b: Double, x: Double)**: *Double* Returns ``b`` raised to the power of ``x``. **Arguments** - **b** (*Double*) -- the base. - **x** (*Double*) -- the exponent. - **ppois(x: Double, lambda: Double, lowerTail: Boolean, logP: Boolean)**: *Double* If ``lowerTail`` equals true, returns Prob(:math:`X \leq` ``x``) where :math:`X` is a Poisson random variable with rate parameter ``lambda``. If ``lowerTail`` equals false, returns Prob(:math:`X` > ``x``). **Arguments** - **x** (*Double*) -- Non-negative number at which to compute the probability density. - **lambda** (*Double*) -- Poisson rate parameter. Must be non-negative. - **lowerTail** (*Boolean*) -- If false, returns the exclusive right-tail probability :math:`P(X > x)`. - **logP** (*Boolean*) -- If true, probabilities are returned as log(p). - **ppois(x: Double, lambda: Double)**: *Double* Returns the left-tail Prob(:math:`X \leq` ``x``) where :math:`X` is a Poisson random variable with rate parameter ``lambda``. **Arguments** - **x** (*Double*) -- Non-negative bound for the left-tail cumulative probability. - **lambda** (*Double*) -- Poisson rate parameter. Must be non-negative. - **qchisqtail(p: Double, df: Double)**: *Double* Returns right-quantile x for which p = Prob(:math:`Z^2` > x) with :math:`Z^2` a chi-squared random variable with degrees of freedom specified by ``df``. ``p`` must satisfy 0 < p <= 1. Inverse of pchisq1tail. **Arguments** - **p** (*Double*) -- Probability - **df** (*Double*) -- Degrees of freedom. - **qnorm(p: Double)**: *Double* Returns left-quantile x for which p = Prob(:math:`Z` < x) with :math:`Z` a standard normal random variable. ``p`` must satisfy 0 < ``p`` < 1. Inverse of pnorm. **Arguments** - **p** (*Double*) -- Probability - **qpois(p: Double, lambda: Double, lowerTail: Boolean, logP: Boolean)**: *Int* If ``lowerTail`` equals true, returns the smallest integer :math:`x` such that Prob(:math:`X \leq x`) :math:`\geq` ``p`` where :math:`X` is a Poisson random variable with rate parameter ``lambda``. If ``lowerTail`` equals false, returns the largest integer :math:`x` such that Prob(:math:`X > x`) :math:`\geq` ``p``. Inverts ppois. **Arguments** - **p** (*Double*) -- Quantile to compute. Must satisfy :math:`0 \leq p \leq 1`. - **lambda** (*Double*) -- Poisson rate parameter. Must be non-negative. - **lowerTail** (*Boolean*) -- If false, returns the right-tail inverse cumulative density function. - **logP** (*Boolean*) -- If true, input quantiles are given as log(p). - **qpois(p: Double, lambda: Double)**: *Int* Returns the smallest integer :math:`x` such that Prob(:math:`X \leq x`) :math:`\geq` ``p`` where :math:`X` is a Poisson random variable with rate parameter ``lambda``. Inverts ppois. **Arguments** - **p** (*Double*) -- Quantile to compute. Must satisfy :math:`0 \leq p \leq 1`. - **lambda** (*Double*) -- Poisson rate parameter. Must be non-negative. - **range(start: Int, stop: Int, step: Int)**: *Array[Int]* Generate an ``Array`` with values in the interval ``[start, stop)`` in increments of step. .. code-block:: text :emphasize-lines: 2 let r = range(0, 5, 2) in r result: [0, 2, 4] **Arguments** - **start** (*Int*) -- Starting number of the sequence. - **stop** (*Int*) -- Generate numbers up to, but not including this number. - **step** (*Int*) -- Difference between each number in the sequence. - **range(start: Int, stop: Int)**: *Array[Int]* Generate an ``Array`` with values in the interval ``[start, stop)``. .. code-block:: text :emphasize-lines: 2 let r = range(5, 8) in r result: [5, 6, 7] **Arguments** - **start** (*Int*) -- Starting number of the sequence. - **stop** (*Int*) -- Generate numbers up to, but not including this number. - **range(stop: Int)**: *Array[Int]* Generate an ``Array`` with values in the interval ``[0, stop)``. .. code-block:: text :emphasize-lines: 2 let r = range(3) in r result: [0, 1, 2] **Arguments** - **stop** (*Int*) -- Number of integers (whole numbers) to generate, starting from zero. - **rnorm(mean: Double, sd: Double)**: *Double* Returns a random draw from a normal distribution with mean ``mean`` and standard deviation ``sd``. ``sd`` should be non-negative. This function is non-deterministic. **Arguments** - **mean** (*Double*) -- Mean value of normal distribution. - **sd** (*Double*) -- Standard deviation of normal distribution. - **rpois(n: Int, lambda: Double)**: *Array[Double]* Returns ``n`` random draws from a Poisson distribution with rate parameter ``lambda``. This function is non-deterministic. **Arguments** - **n** (*Int*) -- Number of random draws to make. - **lambda** (*Double*) -- Poisson rate parameter. Must be non-negative. - **rpois(lambda: Double)**: *Double* Returns a random draw from a Poisson distribution with rate parameter ``lambda``. This function is non-deterministic. **Arguments** - **lambda** (*Double*) -- Poisson rate parameter. Must be non-negative. - **runif(min: Double, max: Double)**: *Double* Returns a random draw from a uniform distribution on [``min``, ``max``). ``min`` should be less than or equal to ``max``. This function is non-deterministic. **Arguments** - **min** (*Double*) -- Minimum value of interval. - **max** (*Double*) -- Maximum value of interval, non-inclusive. - **select(s: Struct, identifiers: String*)**: *Struct* Return a new ``Struct`` with a subset of fields. .. code-block:: text :emphasize-lines: 2 let s = {gene: "ACBD", function: "LOF", nHet: 12} in select(s, gene, function) result: {gene: "ACBD", function: "LOF"} **Arguments** - **s** (*Struct*) -- Struct to select fields from. - **identifiers** (*String\**) -- Field names to select from ``s``. Multiple arguments allowed. - **sqrt(x: Double)**: *Double* Returns the square root of the given value ``x``. **Arguments** - **x** (*Double*) -- the number to take the square root of. - **str(x: T)**: *String* Returns the string representation of a data type. .. code-block:: text :emphasize-lines: 2 let v = Variant("1", 278653, "A", "T") in str(v) result: "1:278653:A:T" - **Variant(contig: String, pos: Int, ref: String, alts: Array[String])**: *Variant* Construct a :ref:`variant` object. .. code-block:: text :emphasize-lines: 2 let v = Variant("1", 25782743, "A", Array("T", "TA")) in v.ref result: "A" **Arguments** - **contig** (*String*) -- String representation of contig. - **pos** (*Int*) -- SNP position or start of an indel. - **ref** (*String*) -- Reference allele sequence. - **alts** (*Array[String]*) -- Array of alternate allele sequences. - **Variant(contig: String, pos: Int, ref: String, alt: String)**: *Variant* Construct a :ref:`variant` object. .. code-block:: text :emphasize-lines: 2 let v = Variant("2", 13427, "T", "G") in v.ref result: "T" **Arguments** - **contig** (*String*) -- String representation of contig. - **pos** (*Int*) -- SNP position or start of an indel. - **ref** (*String*) -- Reference allele sequence. - **alt** (*String*) -- Alternate allele sequence. - **Variant(s: String)**: *Variant* Construct a :ref:`variant` object. .. code-block:: text :emphasize-lines: 2 let v = Variant("7:76324539:A:G") in v.contig result: "7" **Arguments** - **s** (*String*) -- String of the form ``CHR:POS:REF:ALT`` or ``CHR:POS:REF:ALT1,ALT2...ALTN`` specifying the contig, position, reference and alternate alleles.