linalg/utils

array_windows(a, radius)

Returns start and stop indices for window around each array value.

locus_windows(locus_expr, radius[, ...])

Returns start and stop indices for window around each locus.

hail.linalg.utils.array_windows(a, radius)[source]

Returns start and stop indices for window around each array value.

Examples

>>> hl.linalg.utils.array_windows(np.array([1, 2, 4, 4, 6, 8]), 2)
(array([0, 0, 1, 1, 2, 4]), array([2, 4, 5, 5, 6, 6]))
>>> hl.linalg.utils.array_windows(np.array([-10.0, -2.5, 0.0, 0.0, 1.2, 2.3, 3.0]), 2.5)
(array([0, 1, 1, 1, 2, 2, 4]), array([1, 4, 6, 6, 7, 7, 7]))

Notes

For an array a in ascending order, the resulting starts and stops arrays have the same length as a and the property that, for all indices i, [starts[i], stops[i]) is the maximal range of indices j such that a[i] - radius <= a[j] <= a[i] + radius.

Index ranges are start-inclusive and stop-exclusive. This function is especially useful in conjunction with BlockMatrix.sparsify_row_intervals().

Parameters:
  • a (numpy.ndarray of signed integer or float values) – 1-dimensional array of values, non-decreasing with respect to index.

  • radius (float) – Non-negative radius of window for values.

Returns:

(numpy.ndarray of int, numpy.ndarray of int) – Tuple of start indices array and stop indices array.

hail.linalg.utils.locus_windows(locus_expr, radius, coord_expr=None, _localize=True)[source]

Returns start and stop indices for window around each locus.

Examples

Windows with 2bp radius for one contig with positions 1, 2, 3, 4, 5:

>>> starts, stops = hl.linalg.utils.locus_windows(
...     hl.balding_nichols_model(1, 5, 5).locus,
...     radius=2)
>>> starts, stops
(array([0, 0, 0, 1, 2]), array([3, 4, 5, 5, 5]))

The following examples involve three contigs.

>>> loci = [{'locus': hl.Locus('1', 1), 'cm': 1.0},
...         {'locus': hl.Locus('1', 2), 'cm': 3.0},
...         {'locus': hl.Locus('1', 4), 'cm': 4.0},
...         {'locus': hl.Locus('2', 1), 'cm': 2.0},
...         {'locus': hl.Locus('2', 1), 'cm': 2.0},
...         {'locus': hl.Locus('3', 3), 'cm': 5.0}]
>>> ht = hl.Table.parallelize(
...         loci,
...         hl.tstruct(locus=hl.tlocus('GRCh37'), cm=hl.tfloat64),
...         key=['locus'])

Windows with 1bp radius:

>>> hl.linalg.utils.locus_windows(ht.locus, 1)
(array([0, 0, 2, 3, 3, 5]), array([2, 2, 3, 5, 5, 6]))

Windows with 1cm radius:

>>> hl.linalg.utils.locus_windows(ht.locus, 1.0, coord_expr=ht.cm)
(array([0, 1, 1, 3, 3, 5]), array([1, 3, 3, 5, 5, 6]))

Notes

This function returns two 1-dimensional ndarrays of integers, starts and stops, each of size equal to the number of rows.

By default, for all indices i, [starts[i], stops[i]) is the maximal range of row indices j such that contig[i] == contig[j] and position[i] - radius <= position[j] <= position[i] + radius.

If the global_position() on locus_expr is not in ascending order, this method will fail. Ascending order should hold for a matrix table keyed by locus or variant (and the associated row table), or for a table that has been ordered by locus_expr.

Set coord_expr to use a value other than position to define the windows. This row-indexed numeric expression must be non-missing, non-nan, on the same source as locus_expr, and ascending with respect to locus position for each contig; otherwise the function will fail.

The last example above uses centimorgan coordinates, so [starts[i], stops[i]) is the maximal range of row indices j such that contig[i] == contig[j] and cm[i] - radius <= cm[j] <= cm[i] + radius.

Index ranges are start-inclusive and stop-exclusive. This function is especially useful in conjunction with BlockMatrix.sparsify_row_intervals().

Parameters:
  • locus_expr (LocusExpression) – Row-indexed locus expression on a table or matrix table.

  • radius (int) – Radius of window for row values.

  • coord_expr (Float64Expression, optional) – Row-indexed numeric expression for the row value. Must be on the same table or matrix table as locus_expr. By default, the row value is given by the locus position.

Returns:

(numpy.ndarray of int, numpy.ndarray of int) – Tuple of start indices array and stop indices array.