Random functions

Hail has several functions that generate random values when invoked. The values are seeded when the function is called, so calling a random Hail function and then using it several times in the same expression will yield the same result each time.

Evaluating the same expression will yield the same value every time, but multiple calls of the same function will have different results. For example, let x be a random number generated with the function rand_unif():

>>> x = hl.rand_unif(0, 1)

The value of x will not change, although other calls to rand_unif() will generate different values:

>>> hl.eval(x)
0.9828239225846387
>>> hl.eval(x)
0.9828239225846387
>>> hl.eval(hl.rand_unif(0, 1))
0.49094525115847415
>>> hl.eval(hl.rand_unif(0, 1))
0.3972543766997359
>>> hl.eval(hl.array([x, x, x]))
[0.9828239225846387, 0.9828239225846387, 0.9828239225846387]

If the three values in the last expression should be distinct, three separate calls to rand_unif() should be made:

>>> a = hl.rand_unif(0, 1)
>>> b = hl.rand_unif(0, 1)
>>> c = hl.rand_unif(0, 1)
>>> hl.eval(hl.array([a, b, c]))
[0.992090957001768, 0.9564448098124774, 0.3905029525642664]

Within the rows of a Table, the same expression will yield a consistent value within each row, but different (random) values across rows:

>>> table = hl.utils.range_table(5, 1)
>>> table = table.annotate(x1=x, x2=x, rand=hl.rand_unif(0, 1))
>>> table.show()
+-------+----------+----------+----------+
|   idx |       x1 |       x2 |     rand |
+-------+----------+----------+----------+
| int32 |  float64 |  float64 |  float64 |
+-------+----------+----------+----------+
|     0 | 4.68e-01 | 4.68e-01 | 6.36e-01 |
|     1 | 8.24e-01 | 8.24e-01 | 9.72e-01 |
|     2 | 7.33e-01 | 7.33e-01 | 1.43e-01 |
|     3 | 8.99e-01 | 8.99e-01 | 5.52e-01 |
|     4 | 4.03e-01 | 4.03e-01 | 3.50e-01 |
+-------+----------+----------+----------+

The same is true of the rows, columns, and entries of a MatrixTable.

Setting a seed

All random functions can take a specified seed as an argument. This guarantees that multiple invocations of the same function within the same context will return the same result, e.g.

>>> hl.eval(hl.rand_unif(0, 1, seed=0))
0.2664972565962568
>>> hl.eval(hl.rand_unif(0, 1, seed=0))
0.2664972565962568
>>> table = hl.utils.range_table(5, 1).annotate(x=hl.rand_unif(0, 1, seed=0))
>>> table.x.collect()
[0.5820244750020055,
 0.33150686392731943,
 0.20526631289173847,
 0.6964416913998893,
 0.6092952493383876]
>>> table = hl.utils.range_table(5, 5).annotate(x=hl.rand_unif(0, 1, seed=0))
>>> table.x.collect()
[0.5820244750020055,
 0.33150686392731943,
 0.20526631289173847,
 0.6964416913998893,
 0.6092952493383876]

However, moving it to a sufficiently different context will produce different results:

>>> table = hl.utils.range_table(7, 1)
>>> table = table.filter(table.idx >= 2).annotate(x=hl.rand_unif(0, 1, seed=0))
>>> table.x.collect()
[0.20526631289173847,
 0.6964416913998893,
 0.6092952493383876,
 0.6404026938964441,
 0.5550464170615771]

In fact, in this case we are getting the tail of

>>> table = hl.utils.range_table(7, 1).annotate(x=hl.rand_unif(0, 1, seed=0))
>>> table.x.collect()
[0.5820244750020055,
 0.33150686392731943,
 0.20526631289173847,
 0.6964416913998893,
 0.6092952493383876,
 0.6404026938964441,
 0.5550464170615771]

Reproducibility across sessions

The values of a random function are fully determined by three things:

  • The seed set on the function itself. If not specified, these are simply generated sequentially.

  • Some data uniquely identifying the current position within a larger context, e.g. Table, MatrixTable, or array. For instance, in a range_table(), this data is simply the row id, as suggested by the previous examples.

  • The global seed. This is fixed for the entire session, and can only be set using the global_seed argument to init().

To ensure reproducibility within a single hail session, it suffices to either manually set the seed on every random function call, or to call reset_global_randomness() at the start of a pipeline, which resets the counter used to generate seeds.

>>> hl.reset_global_randomness()
>>> hl.eval(hl.array([hl.rand_unif(0, 1), hl.rand_unif(0, 1)]))
[0.9828239225846387, 0.49094525115847415]
>>> hl.reset_global_randomness()
>>> hl.eval(hl.array([hl.rand_unif(0, 1), hl.rand_unif(0, 1)]))
[0.9828239225846387, 0.49094525115847415]

To ensure reproducibility across sessions, one must in addition specify the global_seed in init(). If not specified, the global seed is chosen randomly. All documentation examples were computed using global_seed=0.

>>> hl.stop()                                                   
>>> hl.init(global_seed=0)                                      
>>> hl.eval(hl.array([hl.rand_unif(0, 1), hl.rand_unif(0, 1)])) 
[0.9828239225846387, 0.49094525115847415]

rand_bool(p[, seed])

Returns True with probability p.

rand_beta(a, b[, lower, upper, seed])

Samples from a beta distribution with parameters a (alpha) and b (beta).

rand_cat(prob[, seed])

Samples from a categorical distribution.

rand_dirichlet(a[, seed])

Samples from a Dirichlet distribution.

rand_gamma(shape, scale[, seed])

Samples from a gamma distribution with parameters shape and scale.

rand_norm([mean, sd, seed, size])

Samples from a normal distribution with mean mean and standard deviation sd.

rand_pois(lamb[, seed])

Samples from a Poisson distribution with rate parameter lamb.

rand_unif([lower, upper, seed, size])

Samples from a uniform distribution within the interval [lower, upper].

rand_int32(a[, b, seed])

Samples from a uniform distribution of 32-bit integers.

rand_int64([a, b, seed])

Samples from a uniform distribution of 64-bit integers.

shuffle(a[, seed])

Randomly permute an array

hail.expr.functions.rand_bool(p, seed=None)[source]

Returns True with probability p.

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_bool(0.5))
False
>>> hl.eval(hl.rand_bool(0.5))
True
Parameters:
Returns:

BooleanExpression

hail.expr.functions.rand_beta(a, b, lower=None, upper=None, seed=None)[source]

Samples from a beta distribution with parameters a (alpha) and b (beta).

Notes

The optional parameters lower and upper represent a truncated beta distribution with parameters a and b and support [lower, upper]. Draws are made via rejection sampling, i.e. returning the first draw from Beta(a,b) that falls in range [lower, upper]. This procedure may be slow if the probability mass of Beta(a,b) over [lower, upper] is small.

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_beta(0.5, 0.5))
0.30607924177641355
>>> hl.eval(hl.rand_beta(2, 5))
0.1103872607301062
Parameters:
Returns:

Float64Expression

hail.expr.functions.rand_cat(prob, seed=None)[source]

Samples from a categorical distribution.

Notes

The categories correspond to the indices of prob, an unnormalized probability mass function. The probability of drawing index i is prob[i]/sum(prob).

Warning

This function may be slow when the number of categories is large.

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_cat([0, 1.7, 2]))
2
>>> hl.eval(hl.rand_cat([0, 1.7, 2]))
2
Parameters:
Returns:

Int32Expression

hail.expr.functions.rand_dirichlet(a, seed=None)[source]

Samples from a Dirichlet distribution.

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_dirichlet([1, 1, 1]))
[0.6987619676833735, 0.287566556865261, 0.013671475451365567]
>>> hl.eval(hl.rand_dirichlet([1, 1, 1]))
[0.16299928555608242, 0.04393664153526524, 0.7930640729086523]
Parameters:
Returns:

Float64Expression

hail.expr.functions.rand_gamma(shape, scale, seed=None)[source]

Samples from a gamma distribution with parameters shape and scale.

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_gamma(1, 1))
3.115449479063202
>>> hl.eval(hl.rand_gamma(1, 1))
3.077698059931638
Parameters:
Returns:

Float64Expression

hail.expr.functions.rand_norm(mean=0, sd=1, seed=None, size=None)[source]

Samples from a normal distribution with mean mean and standard deviation sd.

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_norm())
0.347110923255205
>>> hl.eval(hl.rand_norm())
-0.9281375348070483
Parameters:
Returns:

Float64Expression

hail.expr.functions.rand_pois(lamb, seed=None)[source]

Samples from a Poisson distribution with rate parameter lamb.

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_pois(1))
4.0
>>> hl.eval(hl.rand_pois(1))
4.0
Parameters:
Returns:

Float64Expression

hail.expr.functions.rand_unif(lower=0.0, upper=1.0, seed=None, size=None)[source]

Samples from a uniform distribution within the interval [lower, upper].

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_unif())
0.9828239225846387
>>> hl.eval(hl.rand_unif(0, 1))
0.49094525115847415
>>> hl.eval(hl.rand_unif(0, 1))
0.3972543766997359
Parameters:
Returns:

Float64Expression

hail.expr.functions.rand_int32(a, b=None, *, seed=None)[source]

Samples from a uniform distribution of 32-bit integers.

If b is None, samples from the uniform distribution over [0, a). Otherwise, sample from the uniform distribution over [a, b).

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_int32(10))
9
>>> hl.eval(hl.rand_int32(10, 15))
14
>>> hl.eval(hl.rand_int32(10, 15))
12
Parameters:
  • a (int or Int32Expression) – If b is None, the right boundary of the range; otherwise, the left boundary of range.

  • b (int or Int32Expression) – If specified, the right boundary of the range.

  • seed (int, optional) – Random seed.

Returns:

Int32Expression

hail.expr.functions.rand_int64(a=None, b=None, *, seed=None)[source]

Samples from a uniform distribution of 64-bit integers.

If a and b are both specified, samples from the uniform distribution over [a, b). If b is None, samples from the uniform distribution over [0, a). If both a and b are None samples from the uniform distribution over all 64-bit integers.

Examples

>>> hl.reset_global_randomness()
>>> hl.eval(hl.rand_int64(10))
9
>>> hl.eval(hl.rand_int64(1 << 33, 1 << 35))
33089740109
>>> hl.eval(hl.rand_int64(1 << 33, 1 << 35))
18195458570
Parameters:
  • a (int or Int64Expression) – If b is None, the right boundary of the range; otherwise, the left boundary of range.

  • b (int or Int64Expression) – If specified, the right boundary of the range.

  • seed (int, optional) – Random seed.

Returns:

Int64Expression

hail.expr.functions.shuffle(a, seed=None)[source]

Randomly permute an array

Example

>>> hl.reset_global_randomness()
>>> hl.eval(hl.shuffle(hl.range(5)))
[4, 0, 2, 1, 3]
Parameters:
Returns:

ArrayExpression