Statistical functions

chi_squared_test(c1, c2, c3, c4)

Performs chi-squared test of independence on a 2x2 contingency table.

fisher_exact_test(c1, c2, c3, c4)

Calculates the p-value, odds ratio, and 95% confidence interval using Fisher’s exact test for a 2x2 table.

contingency_table_test(c1, c2, c3, c4, …)

Performs chi-squared or Fisher’s exact test of independence on a 2x2 contingency table.

dbeta(x, a, b)

Returns the probability density at x of a beta distribution with parameters a (alpha) and b (beta).

dpois(x, lamb[, log_p])

Compute the (log) probability density at x of a Poisson distribution with rate parameter lamb.

hardy_weinberg_test(n_hom_ref, n_het, n_hom_var)

Performs test of Hardy-Weinberg equilibrium.

binom_test(x, n, p, alternative)

Performs a binomial test on p given x successes in n trials.

pchisqtail(x, df[, ncp])

Returns the probability under the right-tail starting at x for a chi-squared distribution with df degrees of freedom.

pnorm(x)

The cumulative probability function of a standard normal distribution.

pT(x, n[, lower_tail, log_p])

The cumulative probability function of a t-distribution with n degrees of freedom.

pF(x, df1, df2[, lower_tail, log_p])

The cumulative probability function of a F-distribution with parameters df1 and df2.

ppois(x, lamb[, lower_tail, log_p])

The cumulative probability function of a Poisson distribution.

qchisqtail(p, df)

Inverts pchisqtail().

qnorm(p)

Inverts pnorm().

qpois(p, lamb[, lower_tail, log_p])

Inverts ppois().

hail.expr.functions.chi_squared_test(c1, c2, c3, c4)[source]

Performs chi-squared test of independence on a 2x2 contingency table.

Examples

>>> hl.eval(hl.chi_squared_test(10, 10, 10, 10))
Struct(p_value=1.0, odds_ratio=1.0)
>>> hl.eval(hl.chi_squared_test(51, 43, 22, 92))
Struct(p_value=1.4626257805267089e-07, odds_ratio=4.959830866807611)

Notes

The odds ratio is given by (c1 / c2) / (c3 / c4).

Returned fields may be nan or inf.

Parameters
Returns

StructExpression – A tstruct expression with two fields, p_value (tfloat64) and odds_ratio (tfloat64).

hail.expr.functions.fisher_exact_test(c1, c2, c3, c4)[source]

Calculates the p-value, odds ratio, and 95% confidence interval using Fisher’s exact test for a 2x2 table.

Examples

>>> hl.eval(hl.fisher_exact_test(10, 10, 10, 10))
Struct(p_value=1.0000000000000002, odds_ratio=1.0,
       ci_95_lower=0.24385796914260355, ci_95_upper=4.100747675033819)
>>> hl.eval(hl.fisher_exact_test(51, 43, 22, 92))
Struct(p_value=2.1564999740157304e-07, odds_ratio=4.918058171469967,
       ci_95_lower=2.5659373368248444, ci_95_upper=9.677929632035475)

Notes

This method is identical to the version implemented in R with default parameters (two-sided, alpha = 0.05, null hypothesis that the odds ratio equals 1).

Returned fields may be nan or inf.

Parameters
Returns

StructExpression – A tstruct expression with four fields, p_value (tfloat64), odds_ratio (tfloat64), ci_95_lower (:py:data:.tfloat64`), and ci_95_upper (tfloat64).

hail.expr.functions.contingency_table_test(c1, c2, c3, c4, min_cell_count)[source]

Performs chi-squared or Fisher’s exact test of independence on a 2x2 contingency table.

Examples

>>> hl.eval(hl.contingency_table_test(51, 43, 22, 92, min_cell_count=22))
Struct(p_value=1.4626257805267089e-07, odds_ratio=4.959830866807611)
>>> hl.eval(hl.contingency_table_test(51, 43, 22, 92, min_cell_count=23))
Struct(p_value=2.1564999740157304e-07, odds_ratio=4.918058171469967)

Notes

If all cell counts are at least min_cell_count, the chi-squared test is used. Otherwise, Fisher’s exact test is used.

Returned fields may be nan or inf.

Parameters
Returns

StructExpression – A tstruct expression with two fields, p_value (tfloat64) and odds_ratio (tfloat64).

hail.expr.functions.dbeta(x, a, b)[source]

Returns the probability density at x of a beta distribution with parameters a (alpha) and b (beta).

Examples

>>> hl.eval(hl.dbeta(.2, 5, 20))
4.900377563180943
Parameters
  • x (float or Expression of type tfloat64) – Point in [0,1] at which to sample. If a < 1 then x must be positive. If b < 1 then x must be less than 1.

  • a (float or Expression of type tfloat64) – The alpha parameter in the beta distribution. The result is undefined for non-positive a.

  • b (float or Expression of type tfloat64) – The beta parameter in the beta distribution. The result is undefined for non-positive b.

Returns

Float64Expression

hail.expr.functions.dpois(x, lamb, log_p=False)[source]

Compute the (log) probability density at x of a Poisson distribution with rate parameter lamb.

Examples

>>> hl.eval(hl.dpois(5, 3))
0.10081881344492458
Parameters
Returns

Expression of type tfloat64 – The (log) probability density.

hail.expr.functions.hardy_weinberg_test(n_hom_ref, n_het, n_hom_var)[source]

Performs test of Hardy-Weinberg equilibrium.

Examples

>>> hl.eval(hl.hardy_weinberg_test(250, 500, 250))
Struct(het_freq_hwe=0.5002501250625313, p_value=0.9747844394217698)
>>> hl.eval(hl.hardy_weinberg_test(37, 200, 85))
Struct(het_freq_hwe=0.48964964307448583, p_value=1.1337210383168987e-06)

Notes

This method performs a two-sided exact test with mid-p-value correction of Hardy-Weinberg equilibrium via an efficient implementation of the Levene-Haldane distribution, which models the number of heterozygous individuals under equilibrium.

The mean of this distribution is (n_hom_ref * n_hom_var) / (2n - 1) where n = n_hom_ref + n_het + n_hom_var. So the expected frequency of heterozygotes under equilibrium, het_freq_hwe, is this mean divided by n.

Parameters
  • n_hom_ref (int or Expression of type tint32) – Number of homozygous reference genotypes.

  • n_het (int or Expression of type tint32) – Number of heterozygous genotypes.

  • n_hom_var (int or Expression of type tint32) – Number of homozygous variant genotypes.

Returns

StructExpression – A struct expression with two fields, het_freq_hwe (tfloat64) and p_value (tfloat64).

hail.expr.functions.binom_test(x, n, p, alternative)[source]

Performs a binomial test on p given x successes in n trials.

Returns the p-value from the exact binomial test of the null hypothesis that success has probability p, given x successes in n trials.

The alternatives are interpreted as follows: - 'less': a one-tailed test of the significance of x or fewer successes, - 'greater': a one-tailed test of the significance of x or more successes, and - 'two-sided': a two-tailed test of the significance of x or any equivalent or more unlikely outcome.

Examples

All the examples below use a fair coin as the null hypothesis. Zero is interpreted as tail and one as heads.

Test if a coin is biased towards heads or tails after observing two heads out of ten flips:

>>> hl.eval(hl.binom_test(2, 10, 0.5, 'two-sided'))
0.10937499999999994

Test if a coin is biased towards tails after observing four heads out of ten flips:

>>> hl.eval(hl.binom_test(4, 10, 0.5, 'less'))
0.3769531250000001

Test if a coin is biased towards heads after observing thirty-two heads out of fifty flips:

>>> hl.eval(hl.binom_test(32, 50, 0.5, 'greater'))
0.03245432353613613
Parameters
  • x (int or Expression of type tint32) – Number of successes.

  • n (int or Expression of type tint32) – Number of trials.

  • p (float or Expression of type tfloat64) – Probability of success, between 0 and 1.

  • alternative – : One of, “two-sided”, “greater”, “less”, (deprecated: “two.sided”).

Returns

Expression of type tfloat64 – p-value.

hail.expr.functions.pchisqtail(x, df, ncp=None)[source]

Returns the probability under the right-tail starting at x for a chi-squared distribution with df degrees of freedom.

Examples

>>> hl.eval(hl.pchisqtail(5, 1))
0.025347318677468304
>>> hl.eval(hl.pchisqtail(3, 1, 2))
0.3761310507217904
Parameters
Returns

Expression of type tfloat64

hail.expr.functions.pnorm(x)[source]

The cumulative probability function of a standard normal distribution.

Examples

>>> hl.eval(hl.pnorm(0))
0.5
>>> hl.eval(hl.pnorm(1))
0.8413447460685429
>>> hl.eval(hl.pnorm(2))
0.9772498680518208

Notes

Returns the left-tail probability p = Prob(\(Z < x\)) with \(Z\) a standard normal random variable.

Parameters

x (float or Expression of type tfloat64)

Returns

Expression of type tfloat64

hail.expr.functions.pT(x, n, lower_tail=True, log_p=False)[source]

The cumulative probability function of a t-distribution with n degrees of freedom.

Examples

>>> hl.eval(hl.pT(0, 10))
0.5
>>> hl.eval(hl.pT(1, 10))
0.82955343384897
>>> hl.eval(hl.pT(1, 10, lower_tail=False))
0.17044656615103004
>>> hl.eval(hl.pT(1, 10, log_p=True))
-0.186867754489647

Notes

If lower_tail is true, returns Prob(\(X \leq\) x) where \(X\) is a t-distributed random variable with n degrees of freedom. If lower_tail is false, returns Prob(\(X\) > x).

Parameters
Returns

Expression of type tfloat64

hail.expr.functions.pF(x, df1, df2, lower_tail=True, log_p=False)[source]

The cumulative probability function of a F-distribution with parameters df1 and df2.

Examples

>>> hl.eval(hl.pF(0, 3, 10))
0.0
>>> hl.eval(hl.pF(1, 3, 10))
0.5676627969783028
>>> hl.eval(hl.pF(1, 3, 10, lower_tail=False))
0.4323372030216972
>>> hl.eval(hl.pF(1, 3, 10, log_p=True))
-0.566227703842908

Notes

If lower_tail is true, returns Prob(\(X \leq\) x) where \(X\) is a random variable with distribution \(F\) is false, returns Prob(\(X\) > x).

Parameters
Returns

Expression of type tfloat64

hail.expr.functions.ppois(x, lamb, lower_tail=True, log_p=False)[source]

The cumulative probability function of a Poisson distribution.

Examples

>>> hl.eval(hl.ppois(2, 1))
0.9196986029286058

Notes

If lower_tail is true, returns Prob(\(X \leq\) x) where \(X\) is a Poisson random variable with rate parameter lamb. If lower_tail is false, returns Prob(\(X\) > x).

Parameters
Returns

Expression of type tfloat64

hail.expr.functions.qchisqtail(p, df)[source]

Inverts pchisqtail().

Examples

>>> hl.eval(hl.qchisqtail(0.01, 1))
6.634896601021213

Notes

Returns right-quantile x for which p = Prob(\(Z^2\) > x) with \(Z^2\) a chi-squared random variable with degrees of freedom specified by df. p must satisfy 0 < p <= 1.

Parameters
Returns

Expression of type tfloat64

hail.expr.functions.qnorm(p)[source]

Inverts pnorm().

Examples

>>> hl.eval(hl.qnorm(0.90))
1.2815515655446008

Notes

Returns left-quantile x for which p = Prob(\(Z\) < x) with \(Z\) a standard normal random variable. p must satisfy 0 < p < 1.

Parameters

p (float or Expression of type tfloat64) – Probability.

Returns

Expression of type tfloat64

hail.expr.functions.qpois(p, lamb, lower_tail=True, log_p=False)[source]

Inverts ppois().

Examples

>>> hl.eval(hl.qpois(0.99, 1))
4

Notes

Returns the smallest integer \(x\) such that Prob(\(X \leq x\)) \(\geq\) p where \(X\) is a Poisson random variable with rate parameter lambda.

Parameters
Returns

Expression of type tfloat64