Statistical functions¶
|
Performs chi-squared test of independence on a 2x2 contingency table. |
|
Calculates the p-value, odds ratio, and 95% confidence interval using Fisher’s exact test for a 2x2 table. |
|
Performs chi-squared or Fisher’s exact test of independence on a 2x2 contingency table. |
|
Returns the probability density at x of a beta distribution with parameters a (alpha) and b (beta). |
|
Compute the probability density at x of a chi-squared distribution with df degrees of freedom. |
|
Compute the probability density at x of a normal distribution with mean mu and standard deviation sigma. |
|
Compute the (log) probability density at x of a Poisson distribution with rate parameter lamb. |
|
Performs test of Hardy-Weinberg equilibrium. |
|
Performs a binomial test on p given x successes in n trials. |
|
Returns the probability under the right-tail starting at x for a chi-squared distribution with df degrees of freedom. |
|
The cumulative probability function of a normal distribution with mean mu and standard deviation sigma. |
|
The cumulative probability function of a t-distribution with n degrees of freedom. |
|
The cumulative probability function of a F-distribution with parameters df1 and df2. |
|
The cumulative probability function of a Poisson distribution. |
|
The quantile function of a chi-squared distribution with df degrees of freedom, inverts |
|
The quantile function of a normal distribution with mean mu and standard deviation sigma, inverts |
|
The quantile function of a Poisson distribution with rate parameter lamb, inverts |
-
hail.expr.functions.
chi_squared_test
(c1, c2, c3, c4)[source]¶ Performs chi-squared test of independence on a 2x2 contingency table.
Examples
>>> hl.eval(hl.chi_squared_test(10, 10, 10, 10)) Struct(p_value=1.0, odds_ratio=1.0)
>>> hl.eval(hl.chi_squared_test(51, 43, 22, 92)) Struct(p_value=1.4626257805267089e-07, odds_ratio=4.959830866807611)
Notes
The odds ratio is given by
(c1 / c2) / (c3 / c4)
.Returned fields may be
nan
orinf
.- Parameters
c1 (int or
Expression
of typetint32
) – Value for cell 1.c2 (int or
Expression
of typetint32
) – Value for cell 2.c3 (int or
Expression
of typetint32
) – Value for cell 3.c4 (int or
Expression
of typetint32
) – Value for cell 4.
- Returns
StructExpression
– Atstruct
expression with two fields, p_value (tfloat64
) and odds_ratio (tfloat64
).
-
hail.expr.functions.
fisher_exact_test
(c1, c2, c3, c4)[source]¶ Calculates the p-value, odds ratio, and 95% confidence interval using Fisher’s exact test for a 2x2 table.
Examples
>>> hl.eval(hl.fisher_exact_test(10, 10, 10, 10)) Struct(p_value=1.0000000000000002, odds_ratio=1.0, ci_95_lower=0.24385796914260355, ci_95_upper=4.100747675033819)
>>> hl.eval(hl.fisher_exact_test(51, 43, 22, 92)) Struct(p_value=2.1564999740157304e-07, odds_ratio=4.918058171469967, ci_95_lower=2.5659373368248444, ci_95_upper=9.677929632035475)
Notes
This method is identical to the version implemented in R with default parameters (two-sided, alpha = 0.05, null hypothesis that the odds ratio equals 1).
Returned fields may be
nan
orinf
.- Parameters
c1 (int or
Expression
of typetint32
) – Value for cell 1.c2 (int or
Expression
of typetint32
) – Value for cell 2.c3 (int or
Expression
of typetint32
) – Value for cell 3.c4 (int or
Expression
of typetint32
) – Value for cell 4.
- Returns
StructExpression
– Atstruct
expression with four fields, p_value (tfloat64
), odds_ratio (tfloat64
), ci_95_lower (:py:data:.tfloat64`), and ci_95_upper (tfloat64
).
-
hail.expr.functions.
contingency_table_test
(c1, c2, c3, c4, min_cell_count)[source]¶ Performs chi-squared or Fisher’s exact test of independence on a 2x2 contingency table.
Examples
>>> hl.eval(hl.contingency_table_test(51, 43, 22, 92, min_cell_count=22)) Struct(p_value=1.4626257805267089e-07, odds_ratio=4.959830866807611)
>>> hl.eval(hl.contingency_table_test(51, 43, 22, 92, min_cell_count=23)) Struct(p_value=2.1564999740157304e-07, odds_ratio=4.918058171469967)
Notes
If all cell counts are at least min_cell_count, the chi-squared test is used. Otherwise, Fisher’s exact test is used.
Returned fields may be
nan
orinf
.- Parameters
c1 (int or
Expression
of typetint32
) – Value for cell 1.c2 (int or
Expression
of typetint32
) – Value for cell 2.c3 (int or
Expression
of typetint32
) – Value for cell 3.c4 (int or
Expression
of typetint32
) – Value for cell 4.min_cell_count (int or
Expression
of typetint32
) – Minimum count in every cell to use the chi-squared test.
- Returns
StructExpression
– Atstruct
expression with two fields, p_value (tfloat64
) and odds_ratio (tfloat64
).
-
hail.expr.functions.
dbeta
(x, a, b)[source]¶ Returns the probability density at x of a beta distribution with parameters a (alpha) and b (beta).
Examples
>>> hl.eval(hl.dbeta(.2, 5, 20)) 4.900377563180943
- Parameters
x (
float
orExpression
of typetfloat64
) – Point in [0,1] at which to sample. If a < 1 then x must be positive. If b < 1 then x must be less than 1.a (
float
orExpression
of typetfloat64
) – The alpha parameter in the beta distribution. The result is undefined for non-positive a.b (
float
orExpression
of typetfloat64
) – The beta parameter in the beta distribution. The result is undefined for non-positive b.
- Returns
-
hail.expr.functions.
dchisq
(x, df, ncp=None, log_p=False)[source]¶ Compute the probability density at x of a chi-squared distribution with df degrees of freedom.
Examples
>>> hl.eval(hl.dchisq(1, 2)) 0.3032653298563167
>>> hl.eval(hl.dchisq(1, 2, ncp=2)) 0.17472016746112667
>>> hl.eval(hl.dchisq(1, 2, log_p=True)) -1.1931471805599454
- Parameters
x (float or
Expression
of typetfloat64
) – Non-negative number at which to compute the probability density.df (float or
Expression
of typetfloat64
) – Degrees of freedom.ncp (float or
Expression
of typetfloat64
) – Noncentrality parameter, defaults to 0 if unspecified.log_p (bool or
BooleanExpression
) – IfTrue
, the natural logarithm of the probability density is returned.
- Returns
Expression
of typetfloat64
– The probability density.
-
hail.expr.functions.
dnorm
(x, mu=0, sigma=1, log_p=False)[source]¶ Compute the probability density at x of a normal distribution with mean mu and standard deviation sigma. Returns density of standard normal distribution by default.
Examples
>>> hl.eval(hl.dnorm(1)) 0.24197072451914337
>>> hl.eval(hl.dnorm(1, mu=1, sigma=2)) 0.19947114020071635
>>> hl.eval(hl.dnorm(1, log_p=True)) -1.4189385332046727
- Parameters
x (
float
orExpression
of typetfloat64
) – Real number at which to compute the probability density.mu (float or
Expression
of typetfloat64
) – Mean (default = 0).sigma (float or
Expression
of typetfloat64
) – Standard deviation (default = 1).log_p (
bool
orBooleanExpression
) – IfTrue
, the natural logarithm of the probability density is returned.
- Returns
Expression
of typetfloat64
– The probability density.
-
hail.expr.functions.
dpois
(x, lamb, log_p=False)[source]¶ Compute the (log) probability density at x of a Poisson distribution with rate parameter lamb.
Examples
>>> hl.eval(hl.dpois(5, 3)) 0.10081881344492458
- Parameters
x (
float
orExpression
of typetfloat64
) – Non-negative number at which to compute the probability density.lamb (
float
orExpression
of typetfloat64
) – Poisson rate parameter. Must be non-negative.log_p (
bool
orBooleanExpression
) – IfTrue
, the natural logarithm of the probability density is returned.
- Returns
Expression
of typetfloat64
– The (log) probability density.
-
hail.expr.functions.
hardy_weinberg_test
(n_hom_ref, n_het, n_hom_var, one_sided=False)[source]¶ Performs test of Hardy-Weinberg equilibrium.
Examples
>>> hl.eval(hl.hardy_weinberg_test(250, 500, 250)) Struct(het_freq_hwe=0.5002501250625313, p_value=0.9747844394217698)
>>> hl.eval(hl.hardy_weinberg_test(37, 200, 85)) Struct(het_freq_hwe=0.48964964307448583, p_value=1.1337210383168987e-06)
Notes
By default, this method performs a two-sided exact test with mid-p-value correction of Hardy-Weinberg equilibrium via an efficient implementation of the Levene-Haldane distribution, which models the number of heterozygous individuals under equilibrium.
The mean of this distribution is
(n_ref * n_var) / (2n - 1)
, wheren_ref = 2*n_hom_ref + n_het
is the number of reference alleles,n_var = 2*n_hom_var + n_het
is the number of variant alleles, andn = n_hom_ref + n_het + n_hom_var
is the number of individuals. So the expected frequency of heterozygotes under equilibrium, het_freq_hwe, is this mean divided byn
.To perform one-sided exact test of excess heterozygosity with mid-p-value correction instead, set one_sided=True and the p-value returned will be from the one-sided exact test.
- Parameters
n_hom_ref (int or
Expression
of typetint32
) – Number of homozygous reference genotypes.n_het (int or
Expression
of typetint32
) – Number of heterozygous genotypes.n_hom_var (int or
Expression
of typetint32
) – Number of homozygous variant genotypes.one_sided (
bool
) –False
by default. WhenTrue
, perform one-sided test for excess heterozygosity.
- Returns
StructExpression
– A struct expression with two fields, het_freq_hwe (tfloat64
) and p_value (tfloat64
).
-
hail.expr.functions.
binom_test
(x, n, p, alternative)[source]¶ Performs a binomial test on p given x successes in n trials.
Returns the p-value from the exact binomial test of the null hypothesis that success has probability p, given x successes in n trials.
The alternatives are interpreted as follows: -
'less'
: a one-tailed test of the significance of x or fewer successes, -'greater'
: a one-tailed test of the significance of x or more successes, and -'two-sided'
: a two-tailed test of the significance of x or any equivalent or more unlikely outcome.Examples
All the examples below use a fair coin as the null hypothesis. Zero is interpreted as tail and one as heads.
Test if a coin is biased towards heads or tails after observing two heads out of ten flips:
>>> hl.eval(hl.binom_test(2, 10, 0.5, 'two-sided')) 0.10937499999999994
Test if a coin is biased towards tails after observing four heads out of ten flips:
>>> hl.eval(hl.binom_test(4, 10, 0.5, 'less')) 0.3769531250000001
Test if a coin is biased towards heads after observing thirty-two heads out of fifty flips:
>>> hl.eval(hl.binom_test(32, 50, 0.5, 'greater')) 0.03245432353613613
- Parameters
x (int or
Expression
of typetint32
) – Number of successes.n (int or
Expression
of typetint32
) – Number of trials.p (float or
Expression
of typetfloat64
) – Probability of success, between 0 and 1.alternative – : One of, “two-sided”, “greater”, “less”, (deprecated: “two.sided”).
- Returns
Expression
of typetfloat64
– p-value.
-
hail.expr.functions.
pchisqtail
(x, df, ncp=None, lower_tail=False, log_p=False)[source]¶ Returns the probability under the right-tail starting at x for a chi-squared distribution with df degrees of freedom.
Examples
>>> hl.eval(hl.pchisqtail(5, 1)) 0.025347318677468304
>>> hl.eval(hl.pchisqtail(5, 1, ncp=2)) 0.20571085634347097
>>> hl.eval(hl.pchisqtail(5, 1, lower_tail=True)) 0.9746526813225317
>>> hl.eval(hl.pchisqtail(5, 1, log_p=True)) -3.6750823266311876
- Parameters
x (float or
Expression
of typetfloat64
)df (float or
Expression
of typetfloat64
) – Degrees of freedom.ncp (float or
Expression
of typetfloat64
) – Noncentrality parameter, defaults to 0 if unspecified.lower_tail (bool or
BooleanExpression
) – IfTrue
, compute the probability of an outcome at or below x, otherwise greater than x.log_p (bool or
BooleanExpression
) – Return the natural logarithm of the probability.
- Returns
Expression
of typetfloat64
-
hail.expr.functions.
pnorm
(x, mu=0, sigma=1, lower_tail=True, log_p=False)[source]¶ The cumulative probability function of a normal distribution with mean mu and standard deviation sigma. Returns cumulative probability of standard normal distribution by default.
Examples
>>> hl.eval(hl.pnorm(0)) 0.5
>>> hl.eval(hl.pnorm(1, mu=2, sigma=2)) 0.30853753872598694
>>> hl.eval(hl.pnorm(2, lower_tail=False)) 0.022750131948179212
>>> hl.eval(hl.pnorm(2, log_p=True)) -0.023012909328963493
Notes
Returns the left-tail probability p = Prob(\(Z < x\)) with \(Z\) a normal random variable. Defaults to a standard normal random variable.
- Parameters
x (float or
Expression
of typetfloat64
)mu (float or
Expression
of typetfloat64
) – Mean (default = 0).sigma (float or
Expression
of typetfloat64
) – Standard deviation (default = 1).lower_tail (bool or
BooleanExpression
) – IfTrue
, compute the probability of an outcome at or below x, otherwise greater than x.log_p (bool or
BooleanExpression
) – Return the natural logarithm of the probability.
- Returns
Expression
of typetfloat64
-
hail.expr.functions.
pT
(x, n, lower_tail=True, log_p=False)[source]¶ The cumulative probability function of a t-distribution with n degrees of freedom.
Examples
>>> hl.eval(hl.pT(0, 10)) 0.5
>>> hl.eval(hl.pT(1, 10)) 0.82955343384897
>>> hl.eval(hl.pT(1, 10, lower_tail=False)) 0.17044656615103004
>>> hl.eval(hl.pT(1, 10, log_p=True)) -0.186867754489647
Notes
If lower_tail is true, returns Prob(\(X \leq\) x) where \(X\) is a t-distributed random variable with n degrees of freedom. If lower_tail is false, returns Prob(\(X\) > x).
- Parameters
x (float or
Expression
of typetfloat64
)n (float or
Expression
of typetfloat64
) – Degrees of freedom of the t-distribution.lower_tail (bool or
BooleanExpression
) – IfTrue
, compute the probability of an outcome at or below x, otherwise greater than x.log_p (bool or
BooleanExpression
) – Return the natural logarithm of the probability.
- Returns
Expression
of typetfloat64
-
hail.expr.functions.
pF
(x, df1, df2, lower_tail=True, log_p=False)[source]¶ The cumulative probability function of a F-distribution with parameters df1 and df2.
Examples
>>> hl.eval(hl.pF(0, 3, 10)) 0.0
>>> hl.eval(hl.pF(1, 3, 10)) 0.5676627969783028
>>> hl.eval(hl.pF(1, 3, 10, lower_tail=False)) 0.4323372030216972
>>> hl.eval(hl.pF(1, 3, 10, log_p=True)) -0.566227703842908
Notes
If lower_tail is true, returns Prob(\(X \leq\) x) where \(X\) is a random variable with distribution \(F\) is false, returns Prob(\(X\) > x).
- Parameters
x (float or
Expression
of typetfloat64
)df1 (float or
Expression
of typetfloat64
) – Parameter of the F-distributiondf2 (float or
Expression
of typetfloat64
) – Parameter of the F-distributionlower_tail (bool or
BooleanExpression
) – IfTrue
, compute the probability of an outcome at or below x, otherwise greater than x.log_p (bool or
BooleanExpression
) – Return the natural logarithm of the probability.
- Returns
Expression
of typetfloat64
-
hail.expr.functions.
ppois
(x, lamb, lower_tail=True, log_p=False)[source]¶ The cumulative probability function of a Poisson distribution.
Examples
>>> hl.eval(hl.ppois(2, 1)) 0.9196986029286058
Notes
If lower_tail is true, returns Prob(\(X \leq\) x) where \(X\) is a Poisson random variable with rate parameter lamb. If lower_tail is false, returns Prob(\(X\) > x).
- Parameters
x (float or
Expression
of typetfloat64
)lamb (float or
Expression
of typetfloat64
) – Rate parameter of Poisson distribution.lower_tail (bool or
BooleanExpression
) – IfTrue
, compute the probability of an outcome at or below x, otherwise greater than x.log_p (bool or
BooleanExpression
) – Return the natural logarithm of the probability.
- Returns
Expression
of typetfloat64
-
hail.expr.functions.
qchisqtail
(p, df, ncp=None, lower_tail=False, log_p=False)[source]¶ The quantile function of a chi-squared distribution with df degrees of freedom, inverts
pchisqtail()
.Examples
>>> hl.eval(hl.qchisqtail(0.05, 2)) 5.991464547107979
>>> hl.eval(hl.qchisqtail(0.05, 2, ncp=2)) 10.838131614372958
>>> hl.eval(hl.qchisqtail(0.05, 2, lower_tail=True)) 0.10258658877510107
>>> hl.eval(hl.qchisqtail(hl.log(0.05), 2, log_p=True)) 5.991464547107979
Notes
Returns right-quantile x for which p = Prob(\(Z^2\) > x) with \(Z^2\) a chi-squared random variable with degrees of freedom specified by df. The probability p must satisfy 0 < p < 1.
- Parameters
p (float or
Expression
of typetfloat64
) – Probability.df (float or
Expression
of typetfloat64
) – Degrees of freedom.ncp (float or
Expression
of typetfloat64
) – Corresponds to ncp parameter inpchisqtail()
.lower_tail (bool or
BooleanExpression
) – Corresponds to lower_tail parameter inpchisqtail()
.log_p (bool or
BooleanExpression
) – Exponentiate p, corresponds to log_p parameter inpchisqtail()
.
- Returns
Expression
of typetfloat64
-
hail.expr.functions.
qnorm
(p, mu=0, sigma=1, lower_tail=True, log_p=False)[source]¶ The quantile function of a normal distribution with mean mu and standard deviation sigma, inverts
pnorm()
. Returns quantile of standard normal distribution by default.Examples
>>> hl.eval(hl.qnorm(0.90)) 1.2815515655446008
>>> hl.eval(hl.qnorm(0.90, mu=1, sigma=2)) 3.5631031310892016
>>> hl.eval(hl.qnorm(0.90, lower_tail=False)) -1.2815515655446008
>>> hl.eval(hl.qnorm(hl.log(0.90), log_p=True)) 1.2815515655446008
Notes
Returns left-quantile x for which p = Prob(\(Z\) < x) with \(Z\) a normal random variable with mean mu and standard deviation sigma. Defaults to a standard normal random variable, and the probability p must satisfy 0 < p < 1.
- Parameters
p (float or
Expression
of typetfloat64
) – Probability.mu (float or
Expression
of typetfloat64
) – Mean (default = 0).sigma (float or
Expression
of typetfloat64
) – Standard deviation (default = 1).lower_tail (bool or
BooleanExpression
) – Corresponds to lower_tail parameter inpnorm()
.log_p (bool or
BooleanExpression
) – Exponentiate p, corresponds to log_p parameter inpnorm()
.
- Returns
Expression
of typetfloat64
-
hail.expr.functions.
qpois
(p, lamb, lower_tail=True, log_p=False)[source]¶ The quantile function of a Poisson distribution with rate parameter lamb, inverts
ppois()
.Examples
>>> hl.eval(hl.qpois(0.99, 1)) 4
Notes
Returns the smallest integer \(x\) such that Prob(\(X \leq x\)) \(\geq\) p where \(X\) is a Poisson random variable with rate parameter lambda.
- Parameters
p (float or
Expression
of typetfloat64
)lamb (float or
Expression
of typetfloat64
) – Rate parameter of Poisson distribution.lower_tail (bool or
BooleanExpression
) – Corresponds to lower_tail parameter in inverseppois()
.log_p (bool or
BooleanExpression
) – Exponentiate p before testing.
- Returns
Expression
of typetfloat64